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ABSTRACT 



It occasionally happens in economic analyses that the correctly 
specified model contains variables for which no observed data has been 
collected. When the data in a linear regression model are cross- 
sectional it is possible, under certain conditions on the nature of the 
variables, to estimate the independent effects of a specific set of 
explanatory variables on the dependent variable. A procedure for doing 
this is presented. 

A commonly used model of reenlistment behavior, for which the data 
base is cross-sectional, satisfies the requisite conditions. This 
permits the estimation of the independent effect of the military wage 
on reenlistment rate, as an illustration of the proposed procedure. 
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I. INTRODUCTION 



A. PRELIMINARY 

There is currently some concern about the enlistment and retention of 
men to serve in the armed forces in a draft-free environment. In defining 
the problem to be resolved, a number of studies (notably [1]) have attempt- 
ed to describe the factors which affect enlistment and reenlistment 
behavior. A large part of this interest is directed toward the determina- 
tion of a military w"age structure which will ensure that civilians will 
enlist, and that servicemen will reenlist, in sufficient numbers to meet 
service manpower requirements. This paper will concentrate on a part of 
this latter problem. Specifically, the purpose here is to estimate the 
elasticity of reenlistment rate with respect to military wage for first- 
term reenlistees in the Navy. Though studies of this kind have already 
been conducted, there are a number of reasons for additional study. Among 
them is that a new source of data (previously unused data in the form of 
BuPers Report ED198A for fiscal years 1964 through 1970) is used here, 
which is more complete than that used in prior studies. As a consequence 
of the availability of the new data, some omissions of previous studies 
may be corrected. But, most importantly, a somev/hat novel procedure is 
used to estimate the parameter of interest in what will later be introduced 
as the reenlistment model. 

B. BACKGROUND; DESCRIPTION OF THE DATA 

In the past, extensive reliance has been placed in the technique of 
gathering information about reenlistment behavior by the use of surveys 
over potential reenlistees. This technique depends on before-the-fact 
information, which is in the form of the stated intentions of men facing 
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the decision to reenlist. Typically these surveys seek to determine, by 
means of a question and response approach to the subjects, the factors 
which affect the reenlistment decision, and thus have value in indicating 
the lines along which quantitative research should be performed. That is, 
they serve primarily to identify those factors which should enter into an 
analytic model of reenlistment behavior. But once such a model is 
constructed, reliable quantitative results can only be obtained by investi- 
gating the observed behavior of potential reenlistees. This after-the-fact 
information, the revealed reenlistment behavior, is provided by the newly 
available data used in this paper. 

Data extracted from BuPers Report ED198A for use here have the form of 
pooled time series and cross-sectional information. In particular, the 
numbers of men eligible to reenlist and the numbers of these that do in 
fact reenlist are provided for each combination of 

(1) Pay grade: E-1 through E-9 

(2) Rate (a Navy skill or job specialty classification): BM, QM, ST, TM, 

FT, MT, ET, DS, AT, AX, AQ, TD, SM, RD, RM, CT, AC, PT, HM, DT, DM, MU, 

EA, AG, PH, YN, PN, DP, SK, DK, JO, PC, AK, AZ, GM, MN, IM, OM, EN, BT, 

EM, IC, CM, AD, AO, AB, AE, AM, PR, LI, MR, SF, DC, PM, ML, CE, EO, BU, 

SW, MT, CS, SH, SD, MM, AV, SP, BR, EQ, CU, SO, AW, AS. 

(3) Mental Group: I, II, upper III, lower III, IV. 

(4) Fiscal year of reenlistment: 1964 through 1970. First-term reenlist- 

ments only are considered. (First- term reenlistments are those of 
servicemen completing their initial term of active obligated service.) 
Reenlistments beyond the first term are considerably less interesting, 
since these advanced- term reenlistments typically involve personnel already 
committed (psychologically) to a Navy career. 
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"Mental Group," a designation akin to IQ that is applied to enlisted 
personnel, is determined by testing as is intelligence quotient. As such 
it is not likely to be highly reliable. Aside from the facility with 
which personnel in the higher mental groups may enter certain more tech- 
nical Rates, and the fact that it may be significant for an enlisted man 
who wishes to become an officer candidate, there is no special advantage 
or disadvantage accrued by designation as a member of any particular men- 
tal group. On the contrary, there is possibly even a tendency on the 
part of a certain group of men to score poorly, purposely, in the testing. 
This group would consist of some of the personnel of better than average 
education v/ho have enlisted in the Navy, during the past few years of a 
high level of military activity in Vietnam, to fulfill military service 
obligation and to avoid more hazardous duties. It is likely that some 
part of this group, in merely wishing to serve their required time in the 
armed forces, would seek to escape prominence in their enlisted service. 
There is, as a consequence, seemingly little general incentive to score 
well in Mental Group testing. In addition, testing for Mental Group clas- 
sification is subject to the same criticisms that have recently been 
directed at classical IQ testing: some minority groups may be put at a 
disadvantage by the biased (toward comprehensibility by white mid-Americans) 
nature of the test. In any case, classification by Mental Group is cer- 
tainly less reliable than cross-sectional classification by pay grade or 
Rate, or time series classification by fiscal year of reenlistment. As a 
consequence, the Mental Group classification will not be of primary interest 
here. 

Certain of the Rates included in the above report are unsuitable for 
inclusion in the analysis. Those Rates that are discarded from the data 
base are AV, SP, BR, EQ, CU, SO, AW, AS, MT, DS and SD. Any Rate not 
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included in the study was disallowed for one of the following reasons: 

1. The Rate consisted of pay grades E-7 through E-9 only; 

2. The Rate's membership consisted in large part of foreign nationals 
who could be expected to reenlist with high probability; 

3. Data for the Rate were not available for each of the fiscal years 
1964 through 1970. 

The fact that the data consists of a time series of cross-sections of 
revealed reenlistment behavior allows the correction of an omission of 
previous research. To date little effort has been made to establish a 
relationship between the variation over time of reenlistment behavior and 
the variation over time of pecuniary considerations facing the potential 
reenlistee. The time series of cross-sectional data provides a basis on 
which such a relationship can be constructed. The term "constructed" is 
used advisedly, since the pecuniary factors considered here are those 
imbedded in a particular model of reenlistment behavior. 

Another disadvantage of previous research has been that pecuniary 
factors for potential reenlistees have only been considered in coarse de- 
tail. The minuteness of the new cross-sectional data, on the other hand, 
permits a more precise formulation of the economic factors that face the 
individual potential reenlistee. These factors vary from man to man; they 
are dependent on the individual's level of proficiency (pay grade), job 
specialty (Rate), and fiscal year in which the reenlistment decision is 
made. 



11 



II. THEORY UNDERLYING THE REENLISTMENT MODEL 



A. FOUNDATION 

The aim in this paper is to determine the rate of change of first- 
term Navy reenlistments with respect to the rate of change in military 
compensation. Toward this end a model is presented to describe 
reenlistment behavior, quantitatively represented by reenlistment rate, 
in terms of those variables which affect the reenlistment decision. 
Then, using the model as a basis the pure effect of the military wage 
on reenlistment rate is determined. Necessarily, the influence of all 
other variables must be removed in order to estimate the independent 
- effect of the military wage. 

B. TASTE AND OPPORTUNITY FACTORS. 

Consider an individual who is eligible to reenlist. The variables 
which affect his decision may be aggregated into three broad categories 
pecuniary, personal non-pecuniary and general non-pecuniary. The first 
two of these categories are of interest in this section (the final 
category is discussed later). Within the first category are all 
factors which reflect opportunity (monetary) considerations. It 
includes such variables as expected basic military wage, benefits to 
servicemen which may be expressed equivalently in monetary terms, and 
the alternative civilian wage. Elements in the personal non-pecuniary 
class include such factors as military job satisfaction, agreeability 
with the quality of home life offered by Navy service, adaptability 
to the military hierarchy, and attitude towards sea or shipboard 
duty. Variables which are described as non-pecuniary are difficult to 
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quantify. However, by employing the concept of reservation wage (for 
a more complete discussion, see, for example. Gray [2]), the effect of 
these purely individual non-pecuniary factors on the reenlistment deci- 
sion can be incorporated in a variable with analytic expression. The 
qualifying phrase "purely individual" is to be stressed. Just as 
factors which affect the reenlistment decision and which are unique to 
each individual can be identified, so can be recognized non-pecuniary 
factors affecting the reenlistment decision which are unique to each 
Rate, or to each pay grade, or to each year. Variables of this sort 
are the general non-pecuniary factors and will be introduced and 
treated later. This is accomplished by considering the pecuniary 
compensation that will just induce an individual to reenlist. The 
variables in the class of personal non-pecuniary factors can be viewed 
as elements which contribute to the determination of the value of 
compensation required to induce reenlistment. Knowledge of this level 
of compensation for an individual makes knowledge of the personal non- 
pecuniary factors affecting his reenlistment behavior redundant (at 
least in a study v/here interest centers on macroscopic reenlistment 
behavior). As a consequence, the personel non-pecuniary variables 
need not be explicitly considered^ since they are imbedded into the 
individual's reservation wage, which will now be defined. Suppose 
that an individual deliberating reenlistment is capable of estimating 
the expected present value of his alternative courses of action: to 

^This is an advantage of the use of data describing revealed reenlist- 
ment behavior: and individual's personal non-pecuniary attitudes are 

inconsequential; the fact of his reenlistment displays that any 
personal dislikes of the service were overcome by sufficient 
compensation. 
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reenlist or not to reenlist. Let WM represent the present value of all 
pecuniary returns if his choice is to reenlist, and let WC represent 
the present value of all pecuniary returns if he chooses not to reenlist. 
WM consists of two types of pecuniary returns. Most obviously there are 
those whose dollar value is fixed and is not subject to individual 
interpretation: basic pay, variable reenlistment bonus, basic allow- 

ance for subsistence, clothing allowance. There are also pecuniary 
returns whose dollar value is in large part subjectively determined by 
the individual: free medical services for the serviceman and his 
dependents. Navy exchange and commissary privileges and others. This 
distinction is not negligible, and will be treated explicitly later. 

For a serviceman on active duty, the determination of WC is not as 
straightforward as that of WM. Typically the serviceman may have little 
more than a rough estimate, in the year in which the reenlistment 
decision is made, of the mean wage received by civilians working in a 
job category similar to that of the serviceman and located in the geo- 
graphical area of interest to him. Now define ^ as the relative wage. 
Then the reservation relative wage is defined as the value of the above 
ratio which will just induce the serviceman to reenlist. The individual 
will reenlist if his actual relative wage is greater than or equal to 
his reservation relative wage. Similarly, among the entire cohort of 
eligible reenlistees, those that reenlist will be those whose actual 
relative wage is greater than or equal to their reservation relative 
wage. Now consider the domain of possible values of reservation rela- 
tive wage. For each number in this domain, some portion of the eligible 
population will reenlist. As a consequence, the reenlistment rate 
(over the eligible population) has some functional expression over the 
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domain of reservation relative wage. This introduces a variable of 
fundamental importance in constructing an analytic expression for 
reenlistment rate. 

The form of the functional dependence will be discussed later. It 
is worth noting here than an individual's reservation relative wage is 
some fixed value of the ratio Presumably, an individual consider- 
ing reenlistment is able to estimate the expected present value of 
pecuniary returns for not reenlisting, so his reservation relative wage 
can be equivalently expressed as the ratio of a sufficiently large value 
of expected present value of returns for reenlisting to his estimate of 
returns for not reenlisting. This says of course that for each 
individual the reservation wage uniquely determines a value of WM 
sufficiently large to induce reenlistment. As a consequence reenlist- 
ment rate, for fixed WC, has a functional representation over the 
domain of WM: for each value of WM a certain fraction of the eligible 

population with given WC will reenlist. The implications of these 
obvious comments are meant as a preliminary to later v/ork. In order to 
assure proper statistical control of the variables in the model, it is 
necessary to be able to match observations of reenlistment rate with 
corresponding relative wage. That is, a particular set of men eligible 
to reenlist faces a given relative wage (the members of this set who 
reenlist in the face of this relative wage are those for whom this 
relative wage is the reservation relative v/age). This set of men 
eligible to reenlist must be identifiable, for each observed relative 
wage, in order to be able to perform significant statistical analysis. 

By the preceeding remarks, an equivalent necessary condition for proper 
statistical control is that for any fixed value of WC it is possible to 
identify the set of men eligible to reenlist which corresponds to any value 
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of WM. Or, for any value of WC and any value of WM, it is necessary to 
be able to identify the appropriate corresponding eligible population. 

Now just as the purpose of this section was to eliminate the necessity 
of identifying, and including in the model, variables which are in the 
class of personal non-pecuniary factors, a purpose of later section 
will be to remove the requirement that the value of WC for a potential 
reenlistee be known. What will in effect be accomplished is that the 
variable WC will be removed from the model, so that a correspondence 
between reenlistment rate and WM only need be made in order to satisfy 
the functional requirement that reenlistment rate depends on relative 
wage and the statistical requirement that the appropriate eligible 
population be identifiable for given WM and WC. 

C. THE REENLISTMENT MODEL IN CROSS-SECTION AND TIME SERIES; OTHER 

FACTORS AFFECTING REENLISTMENT RATE 

In the preceeding section, a model of the form R = f(WM/WC) was 
postulated, where WM and WC are as previously defined and R represents 
reenlistment rate. Fisher [3] and [4] first concluded that a model of 
the form R = f(ln (WM/WC)) was indicated. Specifically, Fisher concluded 
that the appropriate model was expressed by: 

R = a + 6 In (WM/WC) + e, 

a linear expression for R in ln(WM/WC), with disturbance term e. Later 
work, for example Nelson [5], employed a relation of the form: 

(a) InR = a + B ln(WM/WC) + Z + e, 

where the term Z represents an additional set of variables which are 
included in the model. The variables in Z depend, of course, on the 
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author of the study employing the model. A similar model in Logit form, 
(b) In = a + B ln(WM/WC) + Z + e, 

has also been considered by, for example. Gray [2] and Wilburn [6]. 

In this paper models of both forms (a) and (b) will be considered 
for comparative purposes. Note that equations (a) and (b) may be 
rewritten as: 



(a-) 

(b') 

Or: 

(a") 

(b") 

where: 



1 nR = a + 3 1 nWM - 3 1 nWC + Z + £ , 



= a + 3 1 nWM - 3 InWC + Z + e. 



> = »• (S!) 



V e' 



1-R 



a 



M) V £• 
wc ' ^ 



a' = exp(a), V - exp(Z), and e' = exp(£) 



These equations imply that, depending on which of the models (a) or 

D 

(b) is used, either In R or In(^pp) is linear in the natural log of 
the ration WM/WC (neglecting for the moment the effect of the variables 
in Z). The implicit assumption is made, then, that the potential 
reenlistee values the dollars in WM and in WC in constant ratio. That 
is, the potential reenlistee is indifferent to an equal percentage 
change in WM and in WC: his reenlistment decision remains the same 

whether the relative wage offered him is the ratio WM-j/WC-i, or the 
ration (1 + a)WM^/(l + a)WCp for any a (a may be positive, negative 
or zero, repreenting an increase, decrease or lack of change 

2 

Note that just as reenlistment rate R can be considered to be the 
sample estimate of the probability of reenl i sting , the ratio 
R/(l-R) may be interpreted as the sample estim.ate of the odds of 
reenlisting. 
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respectively in each of WM-| and WC-j). This may not actually reflect the 
candidate reenlistee's utility of dollars in WM and WC. The man may in 
fact value a percentage increase in his civilian alternative wage WC 
more highly (or even less than) the same percentage increase in WM. 

To relieve this possibly erroneous assumption, the following 
revisions to models (a) and (b) will be used: 



(c) 

(d) 




The parameter 6 reflects the possibility that a potential reenlistee 
values a percentage change in WM and the same percentage change in WC 
differently. Presumably, the value of 6 is positive. If this is the 
case, then: if 6 > 1 a percentage change in WC is valued more highly 
than the same percentage change in WM; if 6 = 1 equations (c) and (d) 
become (a) and (b); if 0 < 6 < 1 a percentage change in WM is valued 
more highly than the same percentage change in WC; if 6 = 0 the deci- 
sion to reenlist is independent of the candidate reenlistee's civilian 
alternative wage; a value of 6 < 0 indicates an aversion to civilian 
dollars. These equations may be rewritten as: 

(c‘) InR = a + 3 InWM + y InWC + Z + e, 

(d' ) In (y ^p ~) = a + 3 1 nWM + y 1 nWC + Z + e, 

wjiere: y = -36. 

If Y = -3, then the equations (c') and (d') become (a’) and (b'). 

The coefficient 3 in the equations (c') and (d‘) is the parameter 
of interest. In equation (c'), 3 is the military wage elasticity of 
reenlistment rate since application of the partial differential operator 
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3 to (o'), while neglecting the disturbance term e, yields: 

3(lnR) = 3 3(lnWM) + y 8(lnWC) + 3Z ; 
or 

3R/R = 3(aWM/V/M) + y 3(lnWC) + 3Z. 

Similarly, in equation (d') 3 represents the elasticity of the odds of 
reenlistment with respect to military wage. 

It is now appropriate to consider some assumptions about the nature 
of the cross-section and time series data. First, consider reenlistment 
behavior of cohorts of eligible reenlistees over time. It seems 
reasonable to assume that an individual deliberating reenlistment is 
unaffected by the past reenlistment behavior of others, and that his 
decision is also unaffected by past values of relative wage. Stated 
equivalently, this assumption is that the model contains no lagged 
values of reenlistment rate or relative wage. This is a simplified 
assumption; it is of course also possible to postulate and use a 
model which contains lagged values of relative wage. Now consider the 
effect of the war in Vietnam on initial enlistments or of general 
civilian unemployment on reenlistments in the Navy. These are examples 
of temporal factors that can be expected to have a significant effect 
on initial enlistments (in the first case) or reenlistments (in the 
second case) in the Navy. It seems reasonable, then, that a variable 
reflecting such temporal factors should be included in the model. 
Similarly, a potential reenlistee who is a member of a certain Rate and 
is in a certain pay grade may be affected by factors peculiar to his 
Rate and pay grade, as well as to factors unique to the year in which 
the reenlistment decision is made. In particular, since enlisted men 
in higher pay grades typically enjoy greater prestige and increased 
personal liberty than men in the lower pay grades, it may be hypothesized 
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that pay grade affects reenlistment rate in ways not expressible in 
terms of pecuniary compensation, as well as in its contribution to WM. 

It cannot, then, be fairly assumed that factors which depend on Rate, 
pay grade or year of eligibility to reenlist do not separately influ- 
ence the reenlistment decision. As a consequence, variables represent- 
ing the influence of such factors will be included in the model. [Such 
variables are, in general, unobservable or not quantifiable. Their 
inclusion in the model is a formalism for the sake of completeness.] 

These factors are the general non-pecuniary factors whose existence was 
previously hypothesized. 

Note that nothing has yet been said about the influence of Mental 
Group on the reenlistment decision. It seems likely that personnel in 
different Mental Groups will reenlist at different rates. But designa- 
tion of an individual as a member of a particular Mental Group is some- 
what less accurate, hence less meaningful for statistical purposes, 
distinction than classification of personnel by Rate, pay grade or year 
of reenlistment. Additionally WM for a candidate reenlistee does not 
depend on his Mental Group. [An individual's expected WC may, however, 
depend on his Mental Group. If this is the case, it should emerge in 

comparison of results for separate Mental Groups.] Hence, Mental Group 
classification will not be used to define any of the variables of the 
model. Instead, the model to be constructed will be applied to all 
personnel in each of the Mental Groups separately. The results for the 
Mental Groups will then be statistically compared. 

Now consider a potential reenlistee viewing his military and civilian 
pecuniary alternatives. VJM depends (in a manner to be made explicit 
later) on his Rate and pay grade and on the year in v/hich his current 
enlistment expires. But typically the potential reenlistee's view of 
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his civilian alternatives is limited; he has been efficiently isolated 
from the civilian world and civilian labor market by the requirements 
of his military service. And, typically, it is likely that he has 
been unable to go job-seeking in the geographical area of interest to 
him for civilian life. So it may be realistic to suppose that the 
alternative civilian wage perceived by the potential reenlistee can be 
considered to be the median wage (or average wage) of the civilian 
population working in his skill category (craftsman, mechanical, elec- 
trical, clerical and so on) in the year in which he is eligible to 
reenlist. This will be taken as a formal assumption: the civilian 
alternative wage perceived by an individual in a given Mental Group 
depends only upon his Rate and the year in which the reenlistment 
decision is made. [This assumption may be faulty in that the alterna- 
tive civilian wage may also depend on the potential reenlistee’s 
military pay grade. That is, an advanced rank status in the military 
may promise higher pay in the civilian economy, since it may be 
interpreted as being equivalent to advanced expertise.] 

Since the assumption has been made that variables representing R, 

WM and WC are not lagged in the model, the time series data in R, WM 
and WC may be considered as another cross-section. Make, for the moment , 
the stronger assumption that the model contains no lagged variables at 
all. Then the time series, represented by year in which observations 
are made, may be considered as another cross-section. Let the 

3 

This assumption is made for the sake of simplicity of representa- 
tion. Later it will be seen that the assumption is not necessary; 
equivalent results are obtained if it is not made. At the same time 
it will be seen that the analagous assumption for the variables R, 

WM and WC may be weakened somewhat: identical results will be 

achieved even if the model contains lagged values of the variable 
WC. 



21 



subscripts i, j and t represent Rate, pay grade and year of reenlistment 
eligibility. Then the equations (c‘) and (d‘) can be represented in 
cross-section data as 



(e) 


In R 


(f) 




where: 








= a + 6 In + Y In WC.^ + A. + , 



R,-,-^. is observed reenlistment rate for Rate i, pay grade j, year t; 

1 J t 

WM. .. is military wage for Rate i, page grade j, year t; 

1 J t 

WC^.^ is alternative civilian wage for Rate .i in year t; 

The variables A., B., and C. represent all factors which influence 

1 J T- 

reenlistment in, respectively. Rate i, pay grade j, or year t uniquely; 

^ijt disturbance term for the observation of , B^, and 

are the variables whose introduction into the model was promised 

earlier. Note that these variables are invariant over subscripts 

not included in their notational expression. For example, the factors 

represented by depend only on the year of reenlistment, and are 

invariant over Rate and pay grade. 

Note that a crucial assumption implicit in equations (e) and (f) 

is that the variables R^.j^ and WM are the only variables in the 

model which are not invariant over at least one cross-sectional 

dimension (for convenience, the set of all Rates considered in the 

analysis will be referred to as a cross-sectional "dimension"; similarly 

for the set of all years and the set of all pay grades considered). 

Later work relies heavily on this assumption. 

The models represented by equations (e) and (f) seem reasonably 

complete with the introduction of the variables A., B. and C. as "catch- 

I J ^ 

all" categories to reflect all factors which influence reenlistment 
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depending on Rate, pay grade and year separately. But it is clear that 
the inclusion of these variables creates a problem: quantification of 

A., B. and C. is difficult if not impossible. Note that this problem is 

1 J t 

indissoluble. The influence of such variables as and WC^. ^ on the 
decision of a potential reenlistee is almost certainly non- trivial. 

Their effects cannot reasonably be ignored in any rational model of 
first- term reenlistment behavior. One possible approach to resolving 
this problem is to construct a model using dummy variables to represent 
Rate, pay grade and year. But in the face of 61 rates, nine pay grades 
and seven years this may yield results too minutely specialized to be 
interesting unless a certain amount of arbitrary aggregation (over 
Rates, pay grades and years) is done. In any case, an alternative 
procedure for ridding the models (e) and (f) of the effects of the 
variables A., B. and C. vnll be used here. Use of this procedure is 

1 J t 

also motivated by a desire to rid the model of the variable WC^.^, the 
civilian alternative wage, the method of measurement of which may be 
subject to dispute. 

To specify the procedure, consider: 

(e) In = a + 3 In + y In WC^. ^ + A^. + B^ + » 

in "observed" data. 

Taking the mean, for Rate i and pay grade j , over all years: 



(el) In R. . = a + 6 In WM. . + y In WC. + A- + B. + C +e . . 

IJ* 'J* •* * J ••J 



Where, for example. 



and 



wc,^ = li, . 



for T = number of years considered in the data. 
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Taking the mean, for Rate i in year t, over all pay grades: 

(e2) InR^. ^ = a + Bln WM^ ^ + y In WC^. ^ + A^. + B + + e^. ^ 

Taking the mean, for pay grade j in year t, over all Rates: 

(e3) In R = a + 6 In WM + Y In WC ^ + A + Bj + + e 

Taking the mean, for year t, over all Rates and pay grades: 

(e4) InR .=a+BlnWM .+YlnWC.+A +B +C. +e . 
Taking the mean, for pay grade j, over all Rates and years: 

(e5) InR. = a + BlnWM. + YlnWC +A + B.+C 

•J* *J* •• • J • ‘J* 

Taking the mean, for Rate i, over all pay grades and years: 

(e6) InR. =a + Bln WM. + y in WC. + A. + B + C e. 

!• I • • 

Taking the grand mean: 



(e7) In R 



a+6lnWM +YlnWC +A + B + C + e 



Adding and subtracting, 

(e) - (el) - (e2) - (e3) + (e4) + (e5) + (e6) - (e7) 
yields the equation: 

InR. - InR.. - InR. InR .. + lnR. + 
ijt 1 J. 1 .t .jt 1 . . 

InRj + lnR ^-InR = 

B(ln WM... - In WM . . - In WM. . - In WM .. + In WM. + 

1 jt ij. i.t .jt 1.. 

In WM . + In WM . - In WM ) + 

•J* ••U ••• 

- e,. • - e,. 4 . - e ,.4. + e. + e ,• + e 4. “ e 

1 Jt ij . 1 .t . jt 1 . . .J . . .t 

A similar result holds for the model represented by equation (f). 

This is the form of the data that will be used in a linear regress 
ion to estimate the coefficient B. For want of more convenient termin 
ology, data in the form above will often be referred to as "normalized 



24 



data", while the initial values of each In and In will be 

called the "original data." In addition, the procedure of obtaining 
normalized data from the original data will sometimes be called "the 
model" when no ambiguity is possible. Some features of "the model" in 
this sense are investigated in Section IV. 

Now note that any variable which has fewer than three subscripts in 
its notational expression disappears from the normalized form of the 
data. A little reflection shows that lagged values of any such vari- 
able are also purged in the normalized data. In particular this holds 
for the variable WC^.^. As a consequence, it is only necessary, in 
order to obtain the identical equation in normalized data, to assure 
that the model contains no lagged values of R. .. and WM. 

The question of the nature of the normalized disturbance term: 

ijt ij. i.t .jt 1.. .j. ..t 



will be taken up later. 



D. THE CONSTRUCTION OF WM 

The measurement of WM used here is that proposed by Burton C. Gray 
in [13]. 

As mentioned previously, pecuniary compensation for reenlisting can 
be viewed as consisting of two types of remuneration: the actual wage 

received by the reenlistee and the value placed by the reenlistee on 
the peripheral benefits of military service. A component of the actual 
wage received by a reenlistee that is unique to first- term reenlist- 
ments is the Variable Reenlistment Bonus (VRB). This bonus is a multiple 
of the reenlistee's annual base pay (which in turn depends upon pay 
grade) and varies from year to year and from Rate to Rate (depending 
on the valuation placed on reenlistments in a given Rate in a given year). 
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VRB has since fiscal year 1965 been the primary tool used to selectively 
(by Rate) influence reenlistments. Prior to FY 1965 all reenlistees 
received a reenlistment bonus that was a fixed multiple of annual base 
pay. Ideally, one should wish to evaluate the effect of VRB on first- 
term reenlistment behavior. But since the determination of a single 
parameter of interest is intended simply as being illustrative of the 
fundamental goal of this paper, an investigation of the consequences 
of using normalized data, this is not done. VRB enters the construction 
of WM as merely another component. 

Now consider the future of a reenlistee. He can reasonably expect 
promotion to a higher pay grade within his next term of enlistment, with 
a concurrent increase in pay. This expectation obviously influences the 
reenlistment decision (for it can be supposed that fewer men would 
reenlist without the promise of probable advancement in rank), but in 
a way difficult to specify. The simplifying assumption is made that 
this promise of increased future pay offsets the lesser valuation of 
future dollars. That is, in considering the present value of WM, the 
potential reenlistee employs a discount rate of zero. 

A final assumption, due to the nature of the available data base, 
is made. For want of other information, it is assumed that all 
reenlistments are made for an obligation of four years. 

With the preceeding paragraphs in mind, it is possible to postulate 
the following construction: 



WM = 4C + P 



1 + VRB 
3 



+ 4(1 + K) 



5 



where: for a potential reenlistee WM is the present value of military 
wage for a four-year reenlistment (at a zero discount rate), P is the 
reenlistee's annual base pay, VRB is the appropriate Variable Reenlist- 
ment Bonus multiple. 
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C is a constant representing the monetary valuation of the 
peripheral benefits of military service for a four-year 
reenl istment, 

K is a dimensionless multiplicative constant representing the 
the valuation of those benefits associated with military 
service that can be expected to increase with annual base 
pay. K is intended to reflect such elements as tax 
advantages, allowances and commissary and exchange benefits, 
whose value increases as base pay increases. 

This may be rewritten, for Rate i, pay grade j and year t, as: 

1 + VRB. .. 

^ + 4 (1 + K) 

3 

The construction of WM allows freedom for parameterization of the 
constants C and K. In order to get an idea of the sensitivity of the 
coefficient 6 to changes in assumed C and K, regression analyses are 
performed for various presumably reasonable values of these constants. 



WM, 



ijt 



= 4C + P 



ijt 
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III. APPLICATION 



A. PRELIMINARY 

Consider the consequences of applying the natural logarithm trans- 
formation to the variables R^j^ and • These variables have 

respective ranges of values of [0,1] and [0,»), which under the natural 
logarithm transformation become (-",0] and Thus this trans- 

formation avoids the awkward situation of having a finite range of 
values on the dependent variable (in the case of R--^) in a linear 
regression analysis. But there is a limitation associated with the use 
of the logarithmic transformation: under this transformation a 

reenlistment rate of zero is undefined. Hence in the model represented 
by equation (e) of the preceeding section, no observations of zero 
reenlistment rate can be allowed. Additionally, in the model represented 
by equation (f), a reenlistment rate equal to one must be disallowed, 
since this corresponds to an infinitely large value of the odds of 
reenlistment. Accordingly, since it is desirable to use the same data 
base for each of the models (e) and (f), any observations of reenlist- 
ment rate equal to zero or one will be discarded. This is not felt to 
restrict the analysis too severely since reenlistment rates of zero or 
one, the extreme values of the data, typically correspond to extra- 
ordinary classes of reenlistees. In particular, reenlistment rates of 
zero are most common in very low pay grades and reenlistment rates of 
one are usually observed in the highest pay grades. This suggests that 
a zero reenlistment rate can usually be associated with a class of men 
who show an unsuitability for military service, while a reenlistment 
rate equal to one can usually be associated with the class of men who 
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thrive in the military. Neither of these classes is particularly 

interesting for a study of general reenlistment behavior. 

Now suppose that in models (e) and (f) the error terms are 

independent, identically distributed Normal random variables, each with 

2 

mean zero and variance a . Then the application of ordinary least 
squares procedures to estimate the coefficient 3 in the normalized form 
of model (e) , 



In R. -X - In R. . - In R. ^ - In R + In R. + In R . + 

*J* 

In R . - In R = 

• « L • • • 



3(1 n WM... - In WM.. - In WM. . - 

IJL Ij* 1*L 

In WM . + In WM . - In WM ) 

•J* ••L ••• 

^ijt ' ^ij. ' ^i.t ' ^.jt ^ ^i.. 



In WM .^ + In WM^. 



+ 




+ 



e 



..t 



e 



+ 



yields an unbiased estimator for this coefficient. The same is true for 
ordinary least squares estimation of 3 in the normalized form of model 
(f). These assertions will be proved in Section IV, where it will also 
be shown that the above assumption about the distribution of the 
disturbance terms e... may be relaxed somewhat. 

1 J u 

B. VALUES FOR PARAMETERIZED C AND K 

Regression analyses were performed for each combination of the 

following selected values of the constants C and K; 

C K 

500 
1000 
1500 
2000 
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0.10 

0.15 

0.20 




I 



It is felt that these selected values represent a range broad enough to 
include realistic possible values of the constants. 



C. THE REGRESSION ANALYSES 

In addition to estimating the coefficient g in the normalized forms 
of the models (e) and (f), it may be interesting (for comparative 
purposes) to estimate g in the equations: 



Note that these latter equations are truncated forms of the models 

(e) and (f): the variables WC.., A., B., C. are neglected. 

I L I J L 

Four selections for the value of the constant C and three choices 
for the constant K yield 12 different constructions of WM. Regression 
analyses are conducted for each of these constructions of WM, using 
models (e) (normalized), (f) (normalized), (g) and (h) for each of five 
Mental Groups. This produces 240 least squares estimations to be 
considered. Results for one construction of WM for models (e) (normalized), 

(f) (normalized), (g) and (h) and each of the five Mental Group classi- 
fications are looked at in detail in this section. Less detailed 
regression analysis results for the remaining 11 constructions of WM 
are given in Appendix A in tabular form. 

Now consider Table I, which gives summary results for the construc- 
tion of WM using C = 500 and K = 0.10. Denote Mental Groups I, II, 
upper III, lower III and IV as Mental Groups 1, 2, 3, 4 and 5 respectively. 



(g) In = « + S in MM.JJ t c.jj , 



(h) 




where it is assumed that the are independent, identically 

2 

distributed Normal random variables with mean zero and variance a . 
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Table I 



Normal ized 
Model (e) 


B 


SE 


MG 1 




1.17260 


0.26011 


MG 2 




1.76626 


0.17863 


MG 3 




1.84425 


0.21828 


MG 4 




1.34492 


0.20119 


MG 5 




1.50907 


0.28158 


Normal ized 
Model (f) 






MG 1 




1.87660 


0.36445 


MG 2 




2.72210 


0.24978 


MG 3 




2.61042 


0.30134 


MG 4 




2.00364 


0.28072 


MG 5 




2.16256 


0.39745 


Model 


(g) 






MG 1 




1.36861 


0.12644 


MG 2 




1.91656 


0.09547 


MG 3 




1.58111 


0.11230 


MG 4 




1.44961 


0.12798 


MG 5 




1.54090 


0.14984 


Model 


(h) 






MG 1 




1.85354 


0.17451 


MG 2 




2.70828 


0.13598 


MG 3 




2.05608 


0.15295 


MG 4 




1.93526 


0.17588 


MG 5 




2.11862 


0.21826 



t 


-'2 

a 


R 


N 


4.49983 


0.19601 


0.1904 


720 


9.90073 


0.15014 


0.3070 


1259 


8.44902 


0.17024 


0.2956 


996 


6.68474 


0.15299 


0.2629 


805 


5.35927 


0.13337 


0.2601 


530 



5.14912 


0.38339 


0.2167 


720 


10.89793 


0.29433 


0.3346 


1259 


8.66258 


0.32445 


0.3025 


996 


7.13740 


0.29784 


0.2793 


805 


5.44106 


0.26571 


0.2638 


530 


10,82445 


0.59642 


0.3746 


720 


20.07401 


0.65793 


0.4927 


1259 


14.07977 


0.62178 


0.4078 


996 


11.32667 


0.62849 


0.3712 


805 


10.28386 


0.46204 


0.4085 


530 


10.62108 


1.13624 


0.3685 


720 


19.91696 


1.33460 


0.4898 


1259 


13.44309 


1.15332 


0.3922 


996 


11.00301 


1.18701 


0.3620 


805 


9.70676 


0.98037 


0.3891 


530 
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Let B denote the estimate for S, SE represent the standard error of the 

estimate of 6, t represent the computed t-statistic, o be the estimate 
2 

of the variance a , R be the multiple correlation coefficient and N 
represent the number of observations of [It will be shown in 

Section IV that a is an unbiased estimator fora .] Note that the 
computed values of the t-statistic indicate that in each of the twenty 
least squares estimations of B represented in Table I the estimated 
coefficient is significantly different from zero. But also note that in 
comparing results for the normalized models (e) and (f) and the corres- 
ponding truncated non-normal i zed models (g) and (H), the following 

differences are consistently true for each Mental Group: 

1. The values of computed t-statistic for models (g) and (h) 
are greater than the values for models (e) and (f). 

2. The standard error of the esimate is less for models (g) 
and (h) than for models (e) and (f) 

3. The multiple correlation coefficient R is greater for 
models (g) and (h) than for models (e) and (f). 

These considerations might seem to indicate that models (g) and (h) 
fit the data better than the corresponding normalized forms of models 
(e) and (f). But in reality the results 1., 2., and 3. are not particul- 
arly surprising, since the computed value of t is directly proportional 
to, and the computed value of SE inversely proportional to, the square 
root of the sum of squared deviations from the mean of the explanatory 
variable, while 1-R is inversely proportional to the sum of squared 
deviations from the mean of the dependent variable. That is, for a 
single explanatory variable with observed values x^. , i = 1, ...n, and 
a dependent variable with observed values y^. , i = 1, ...n. 



SE 







(X, - I)^ 
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SE 



and: 



= 1 - 



I (yi - 

J !_ 

n 

I (yj - 



BX.)‘ 



where: 



^ " _ 1 ” 

y = F 1^1’ J *i • 

1 1 



-2 

B is the estimated regression coefficient, and a is the estimate of 
2 

a . Hence as the sum of squared deviations from the mean of both the 

explanatory variable and the independent variable decrease, it is to be 

2 

anticipated that SE and R will increase and the computed t-statistic 
will decrease. To see how this fact yields the results in comparisons 
1., 2., and 3. above, consider the explanatory and dependent variables 
of the models (e) (normalized) and (g). Dropping for a moment the 
logarithm symbol, model (e) (normalized) has dependent variable; 



- ''ij. • • ’’.jt * '*i.. + ^.t - 



and explanatory variable; 



WM. . - WM.. - WM. . 

I J * J • I • L 



.. + WM. + WM . + WM . - WM 

iJL 'J* ••L • 



both of which have mean zero, while model (g) has dependent variable 
R^.j^ and explanatory variable Taking squared deviations from 
the mean for the variable 
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i j ••• 1 t 



I n (R it - R )^ + I J I (R - R 

• •*J^ ••• ••• ••L 

J t t 



I T Hr - r ,• )^ + 0 t i(r - Ri ^ 

' ^ • • • 1 • • 





+ R . + R . - R 

•J« ••L ••• 



since all terms in the above equation are non-negative. But the term 
on the right hand side of this inequality is the sum of squared devia- 
tions from the mean of the dependent variable in the normalized form of 
model (e). A similar result holds in the comparison of the sum of 
squared deviations from the mean of the explanatory variables in 
models (e) (normalized) and (g). And a similar result holds in the 
comparison of the models (f) (normalized) and (h) as well. As a 
consequence, the results of comparisons 1., 2., and 3. are not unexpected. 

Now consider the estimates of 6 presented in Table I. All estimates 
of the military wage elasticity of the odds of reenlistment and the 
probability of reenlistment exceed one. In fact, the estimates of the 
elasticity of R with respect to VIM cluster loosly about a value of 1.5, 

D 

while the estimates of the elasticity of with respect to VIM have a 
median value of approximately 2. Since these estimates are based on a 
single choice for the construction of VIM no great import will be assigned 
to them, except to note that they are not appreciably different from 
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estimates of these quantities obtained in other studies. For example, 
estimates of the WM elasticity of R in previous studies are generally 
confined to the range 0.8 to 3, with the bulk of the estimates lying 
in a range of values between 1 and 2. Note that in the normalized forms 
of models (e) and (f) the estimates of 3 for Mental Groups II and upper 
III seem to be appreciably higher than estimates of this coefficient 
for Mental Groups, I, lower III and IV (this apparent difference is 
not so marked for models (g) and (h); in any case models (g) and (h) 
are of interest here only for a comparison of results with the corres- 
ponding normalized forms of models (e) and (f), so that the former 
models will not be treated further). This result agrees very well with 
prior expectations: it indicates that personnel in the highest and 

lowest Mental Groups are less inclined toward reenlistment than men in 
the median Mental Groups. It can be argued that this result is reason- 
able since men in Mental Group I, who presumably possess greater 
intellectual ability, may find greater rewards and challenges in civilian 
life than in enlisted military service, while men in Mental Groups lov/er 
III and IV may often find themselves unable to compete for advancement 
successfully with men in higher Mental Groups, and may sometimes be 
unable to meet demands of competence placed on them by military service. 
For both the highest and lowest Mental Groups, then, enlisted military 
service may be viewed as limited in opportunity. To establish the 
validity of these initial observations it is desirable to determine if 
the estimates B contained in Table I do in fact estimate different 
coefficients 3 for different Mental Groups (that is, whether the 
same coefficient 3 applies for all Mental Groups or whether different 
coefficients 3.j apply for different Mental Groups). 
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Toward this end a statistical test, in which the estimates B may 

be compared for each pair of Mental Groups in each of the models (e) 

(normalized) and (f) (normalized), is in order. Concentrate now on 

the normalized form of model (e). For the regression analysis of 

^2 2 

Mental Group i, i =1, ...5, let be the estimate of a , B. be the 
estimate of 3^, and n^ be the number of observations. Since the 
estimated intercept for each least squares estimation using the 
normalized form of model (e) is zero, testing for the equality of the 
coefficients 3^- is equivalent to testing for the equality of the 
appropriate regression lines. Now if Mental Groups i and j yield the 

same regression line in the normalized form of model (e), then and 

^2 2 

a. both estimate the same variance a . And in this case, 

J 



(I-1)(J-1)(T-1) n. 
IJT 




with 



(I-1)(J-1)(T-1) n. 
IJT 



- 1 degrees of freedom. 



and 

■(I-1)(J-1)(T-1) n^ 
IJT 




with 



(I-1)(J-1)(T-1) n. 
IJT 



-1 degrees of freedom, 

where these two Chi-squared random variables are independent since they 
are derived from two different (and assumed independent) populations of 
random variables. [See Section IV for the development of this asser- 
tion. Here I = 61 is the number of Rates, J =9 is the number of pay 
grades and T = 7 is the number of years considered.) 
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I 



Hence as the sum of two independent x random variables, the 
quantity: 



1 



(I-1)(J-1)(T-1) 



IJT 



^2 ^ ^2 
"i "i ^ "j "j 



,.2 . . 2^1 

- (a. + 0 .), 



2 

has X distribution with: 



i Jii UJ z jid zii („. . . 2 

degrees of freedom. Now if Mental Groups i and j yield the same reg- 
ression line then 3- - 3^ = 0, in which case B. - B. is Normally 
distributed with mean zero (since B. and B. are unbiased estimators of 

’ J 

3- = 3.) and variance: 

* 



Var 



(B. - B.) = Var (B.) + Var (B.) = — 



"i 

i -i\2 i /vj vJ^2 



I (x' - I (X^ - x^) 

k=l k=l 



where for convenience represents the k^^ observation on the explana- 
tory variable for the normalized form of model (e), applied to Mental 
Group m = i,j. Hence: 



°i - °.i 



1 



1 



,i vix2 



n . 
J 



nl 

2 



■J\2 



I (X' - X')^ I {X'l - X'J) 
L k=l ^ k=l 



~ N(0,1) 



2 2 

As a consequence, under the composite hypothesis that d. and a. estimate 

J 

2 

the same parameter o and that 3^. = 3., the quantity: 
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1 

2 



(B, 






(I-1)(J-1(T-1) 

IJT 



(n, + 



n.)-2 



a 




I (xj - x') 



k=l 



+ 



2 




1 ! 




° 1 


IJT 



-2 -2 
n . a . + n . a . 
11 J J 







(I-1)(J-1)(T-1) 

IJT 



(n, + 






2 



1 

2 



1 



i (x^ - x^y 

k=l 




(I-1)(J-1)(T-1) 

IJT 



'' 2 ^2 

nio.+njOj. 



-(c,+Oj) 



1 

2 



has t-distribution with: 



(I-1)(J-1)(T-1) 

IJT 



(n, + 






degrees of freedom. Computing this statistic, for the normalized forms 
of models (e) and (f) separately, for each pair of Mental Groups, I, II, 
upper III, lower III and IV yields the results given in Table II. 

Note that for very high level of significance, none of the coeffici- 
ents B^. , Bj (for either model (e) or (f)) test significantly different 

from each other, so that for high chosen level of significance the com- 

-2-2 2 

posite null hypothesis that a- and a. both estimate common o and that 

* V 

B.j = 6j cannot be rejected. But note that the magnitudes of the computed 
t-statistics for the most part give credence (especially in the normalized 
form of model (f)) to the observations that prompted this test: the sets 
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TABLE II 



1 ,j 


t(R) 


*(i-r) 


df 


1,2 


1.95 


1.98 


1481 


1.3 


2.00 


1.57 


1284 


1,4 


0.53 


0.28 


1141 


1.5 


0.84 


0.51 


935 


2,3 


0.28 


0.28 


1688 


2,4 


1 .57 


1 .91 


1545 


2.5 


0.75 


1 .17 


1339 


3.4 


1.68 


1 .47 


1348 


3,5 


0.91 


0.87 


1142 


4,5 


0.47 


0.32 


999 



(i,j) refers to the comparison of coefficients for Mental Groups i and j. 
t(R) is the computed t-statistic for the normalized form of model (e). 




the computed t-statistic for the normalized form of model 



(f) 



df is the appropriate degrees of freedom. 



of the t-distribution to the nearest integer. 
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{B2, 33} and {3I, g4, 35} of coefficients may be accepted as being 
different from each other, and the coefficients within each of these sets 
may be accepted as being the same, at an appreciably higher level of 
significance than any other partition of the set {3I, 32, 33, 34, 35} . 
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IV. FEATURES OF THE MODEL 



A. A MORE GENERAL CROSS-SECTIONAL MODEL 

Consider a slightly more general form of the reenlistment model. 

For simplicity in the derivation of results, suppose that three 
cross-sectional dimensions are involved. Let Y = Xg + Zfi + e, where 
Y is an n-vector of observations on the dependent variable, X is an n x k 
matrix of observations on k explanatory variables, each of which varies 
over all cross-sectional dimensions (as did WM^^^ in the reenlistment 
model), B is a k-vector of coefficients corresponding to the variables X, 

Z is an n X m matrix of observations on m explanatory variables, each of 
which varies over at most two cross-sectional dimensions (as did WC^.^ 

and C^ for example, in the reenlistment model), n is an m- vector of 
coefficients corresponding to the variables in Z. Then it is evident that, 
if the observations are "normalized" as in the reenlistment model, the 
variables Z will disappear from the normalized data. So the model in 
normalized form becomes = X^ g + e^, where, for example, the typical 
element of is: 



ijt 



- e 



1J- 



- e 



i .t 



- £ 



, jt ^i . . 



'.J 



The procedure of normalizing data in this manner, then, is advantageous 
when it is desirable to rid the model of one or more of the variables in 
Z. For example, theoretical or practical considerations may dictate 

i 

that a variable in Z be included in the model, but this variable may in 

practice turn out to be unobserved (as was WC^.^ in the reenlistment 

% 

model) or even unobservable (as v/as in the reenlistment model). An 
obvious disadvantage is that all the variables Z disappear in the 
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normalized data, so that none of the coefficients in n can be estimated 
using normalized observations. The normalization procedure can also be 
used to advantage to rid the model of disturbance terms of a certain 
form. This is the subject of a later part of this section. 



B. A NECESSARY IDEMPOTENT MATRIX 

Consider the set of all ordered triples of three indices, i, j, t: 
{(i.j.t): i = 1, ...I, j = 1, ...J, t = 1, ...T} 



There are IJT unique such ordered triples. Construct an IJT x IJT 

matrix, the rows and columns of which are each indexed with one of the 

ordered triples (i, j, t), as follows: If the row of this matrix, 

call it V, is indexed with (i-j, j-j , t^ ) ; then the column of V is also 
indexed with (ip jp t-j). For the row of V indexed with (ip jp t-j ) 



and the column of V indexed with (i 2 , 



element of V be equal to 

-(J-1)(T-1)/IJT if 

-(I-1)(T-1)/IJT if 

-d-l)(J-l)/IJT if 

(T-1)/IJT if 

(J-1)/IJT if 

(I-1)/IJT if 

-I/IJT if 



(I-1)(J-1)(T-1)/IJT if 

Within each row and each column of V, 
the first type, (J-1) elements of the 
third type, (I-1)(J-1) elements of thi 



j^, t 2 ), let the corresponding 
d / ^2* "^1 ~ *^2’ ^1 ~ ^2 

i"! ~ ^2^ "^1 ^ J 2 ’ ^1 ~ ^2 

i"! “ ^2* "^1 ~ J 2 ’ ^1 ^ ^2 

il f' ^2’ J] ^ j' 2 ’ ~ ^2 

d ^ ^2^ "^1 ”^2’ ^1 ^ ^2 
i"l ~ ^2’ "^1 ^ J 2 ’ ^1 ^ ^2 

il ^ i2> j] ^ J2» ^1 ^ ^2 

~ ^2’ 'll “J 2 ’ ^1 ~ ^2 
then, there are (I-l) elements of 
second type, (T-1) elements of the 
fourth type, (I-l) (T-1) elements 
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of the fifth type, (J-1)(T-1) elements of the sixth type, { I-l ) (J-1 ) (T-1 ) 
elements of the seventh type, and one element of the eighth type. 

From the symmetrical construction of V, it is apparent that V is 
symmetric. That V is singular is also apparent, since VN = 0, where N 
is the n-vector with unit elements (that is, the sum of the elements in 
each row and each column of V is equal to zero) and n = IJT. 

And it can be shown that V is indempotent as well; Let X be an 

t h 

arbitrary n x r matrix. For convenience of representation, let the m^^ 

j. u 

row of X be indexed with the same ordered triple (i, j, t) as the m^^ 

row of V. Consider the column of VX. If X*^ is the column of X, 
k th 

then VX is the k column of VX, so that without loss of generality it 
is necessary only to consider the case r = 1 in order to establish the 
form of VX. Let X^.^, j-j t-j be a typical element of the n x 1 matrix X. 
The the (i-j, j-j , t-j)^^ element of VX is of the form: 



1 

IJT 



(I-1)(J-1)(T-1) X. . . - (J-D(T-l) I X.. . 

^i^ri i=l ^JiM 



i?«i 



1 



(I-D(T-I) I X. . - (I-D(J-I) I X. . 

j=i 'I'^^i t=i 'i^r ^ 

(T-l) I I X.. + (J-1) I I X + (I-l) I I X. 

i=l j=l ^“^^1 i=l t=i j=l t=i 

i/^i-| J7j-| ij^i-| tj^t^ t^t^ 

I J ■ T 

-III 

i=l j=l t=l 
ij^i-| J7j-| t^t-j 
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IJT 



IJT X 



J 



IT X. 
j=l 



T 


0 T 




I 


T 


IJ 1 X. 
t=l ^1 


i t " I I 1 

j=i t=i 




J 1 
1=1 




I J 


I 


J T 




-1 


' -^1 -^1 
1=1 J=1 


' i=l 


I I 

j=l t=l 


'‘ijt 


- 




T 1 ^J,t, 


_ I y 
' j 




■r 1 * 


1 — J-l-> 
t — 


*TT 1 


I 


l‘ ' 


|j 1 1 hit, ■ 




II 

4-> 

X 








X. . . - 


X . -X. . 




+ X. 


+ X . + X . 


llJlti 


.Jltl l^.t-, 


^1 


•• *Jl* ••t-| 



That is, the matrix V is the linear transformation which reduces the 
original data X to data in the normalized form. 

I 

Now consider the matrix product VVX. Let X- . . be the typical 
element of VVX, and let xV . . represent the typical element of VX: 

n^ri 



X? . . 

i^Jlti 



- X . . - X. . - X. - + X. + 

111 .j-|ti i 1 • t-| ^1* 



X i + X - X 

• J *1 • • • L-| « 
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Analagous to the above derivation, 



1 


X? . “ - X° - X? . . + x° + 

-^1^1 ^r* 



X°. + X° . - X° 

• J ^ • 



But: 

X°. 


X..-X..-X.-X. +X * 

•Jl^l *'^1^1 • • ^”1 • J *1 • ••• 



X. +X.-X ^0 

• J 1 • • • 



X? 

1" 


X. . - X . - X. . - X. + X. + 

1 1 • • • L*| ' 1 * 1 1 • • * 1 * * 



X + X ^ - X = 0 



X 

-j. o 

C_J, 

II 


X. - X . - X - X. . + X. + 

•J'l* I’j.* 



X . + X - X = 0 

••• ••• 



X 

- 1 . o 

II 


X 

m-i, 

1 

X 

1 

^x 

1 

X 

+ 

X 

+ 



X + X - X = 0 



11 

o • 

X 


X. -X. -X -X. +x + 

• J*!* •J'j* ••• •J’l* ••• 




X . + X - X =0 

••• ••• 


x° = 


X.-X.-X.-X +X +X +X.-X =0 

• • U 1 ••• ••• •• U"! • • • 


II 

o 

X 


X -X -X -X +X +X +X -X =0. 
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So that: 



= X° . 






In particular this holds for the vector which has zeros in each ele- 
merit except the which is equal to one. That is VVX° = VX^ . But 
VVX^ is the column of VV, and VX^ is the k^*^ column of V. This holds 
for each k = 1 ... IJT, so that each column of VV is equal to the 
corresponding column of V. Hence VV = V, so that V is, by definition, 
idempotent. 

The idempotency of V can be seen equivalently as follows. Consider 
the equation VX = xX, where x is any eigenvalue of V, and X is a corres- 
ponding eigenvector (X 7 ^ O) by assumption). Pre-multiplying both sides 
of this equation by V yields: 



VVX = VXX = XVX = X^X. 

But VVX = VX = xX, so that xX = X^X. So either X = 0 or it is possible 
to divide by x to get X = XX. Or X'X = X'xX = xX'X, where X'X is a 
strictly positive scalar. Hence if X ^ 0, then x = X'X/X'X=1. That is, 
for the matrix V, all eigenvalues are equal to 1 or to 0. Now the claim 
that V is indempotent can be made, since a sufficient condition for a 
symmetric matrix to be indempotent is that each of its non-zero eigen- 
values be equal to unity. 

Now since V is indempotent, its rank is equal to its trace. And the 
trace of V is equal to the sum of its diagonal elements. That is, tr(V) 

= IJT [(I-1)(J-1)(T-1)/IJT] = (I-1)(J-1)(T-1). Hence the rank of V 
is (I-1)(J-1)(T-1). 



C. ORDINARY LEAST SQUARES ESTIMATION UNDER THE TRANSFORr-IATION V 

Consider once again the model described in Section A, Y = X3 + Zn + e 
where Y,X,b,Z,q and e are as defined there. Recall that the number of 
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cross-sectional dimensions involved was assumed, for purely illustrative 
purposes, to be three. Suppose that one cross-sectional dimension is 
resolved into I categories, the second dimension into J categories, and 
the third dimension into T categories. Then there are n = IJT observa- 
tions in Y, and to each observation in Y there can be assigned a unique 
ordered triple (i,j,t) which represents the appropriate category of each 
of the cross-sectional dimensions for that observation in Y. Obviously 
this same ordered triple is assigned to the corresponding observations 
of the variables in X and in Z, as well as to the corresponding element 
of e. Now suppose that the matrix V has been constructed so that the index 
of the p ^ row of V is equal to the index of the p ^ observation in Y. 

Then pre-multi plying the above equation by V yields VY = V X 3 + VZfi + Ve, 
where VZ = 0 and VY 0 VX since by assumption the dependent variable 
whose observations are represented by Y and the k explanatory variables 
whose observations are represented by X vary over all cross-sectional 
dimensions, while the variables whose observations are represented by Z 
vary over at most two cross-sectional dimensions. So the equation 
becomes VY = V X 3 + Ve. 

Note that the above property provides a concise operational defini- 
tion of the phrase "varies over all cross-sectional dimensions." A non- 
stochastic variable whose vector of observations, over all possible 
categories of the cross-sectional dimensions, is given by W may be said 
to vary over all cross-sectional dimensions if VW ^ 0. It will be shown 
in a later section that the element of VW which is indexed by (i,j,t) may 
be interpreted as the three-way interaction of the i ^ category of one 

4-U 

cross-sectional dimension, the category of the second dimension, and 
th 

the t^" category of the third dimension. Similarly, for a stochastic 
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variable whose vector of observations is given by W, the element of VW 
indexed by (i,j,t) may be interpreted as the sample estimate of this 
three-way interaction term. 

Now in order to discuss the ordinary least squares estimator of B in 
the equation VY = V X 3 + Ve it is necessary to consider the rank of VX. 
Suppose that r (X) = k (k < n), so that (X' X)~^ exists. If it were the 
case that r (X) < k, then the coefficient vector 3 in the equation 
Y = X3 + +e would be inestimable in the original data, since a necess- 
ary condition for the ordinary least squares estimators, in the original 
data, of 3 and q to exist is that both X' X and Z' Z are nonsingular. 

That is, these estimators in the original data, in partitioned matrix 
form, 

-1 



■ 3 ] [X' 


' X 


X'Z 


'X' 


Y 


iJ L Z' 


' X 


Z'Z - 


.Z' 


Y J 


exist only if (X' X)”^ and (Z‘ 


Z)-' 


exist. 


So the 


assumption that r (X) 



= k is no more restrictive in the ordinary least squares estimation of 3 
using data in the form VY, VX than it was in the ordinary least squares 
estimation of 3 using the original data Y, X. [Note that this discussion 
applies only to estimation of the originally specified k-vector 3 of 
coefficients. It may of course be possible, even if r (X) < k, to 
estimate a linear combination of some of the coefficients in 3 . But this 
is not the goal here.] Now since r (V) = (I-l ) (J-1 ) (T-1 ) , a necessary 
condition for (VX)' (VX) = X' VX to be nonsingular is that r (VX) = K. 

So a necessary condition is that K ^ (I-l ) (J-1 ) (T-1 ) . That is, that the 
matrix X represents observations on at most (I-l )(J-1 ) (T-1 ) explanatory 
variables. Consequently, in all discussion hereafter, the requirement 
that K < (I-l) (J-1) (T-1) <IJT = n will be made. 
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Additionally, the requirement that r (VX) = k means that the columns 

of VX must be linearly independent. But these are simply the vectors 

which represent the three-way interaction terms for each variable in X. 

This is a new restriction, not encountered when basing estimators upon 

the original observations. It may turn out, in some cases, to prohibit 

application of V in the model. It is certainly not prohibitive when X 

represents observations on only one explanatory variable (as was the case 

for WM. .. in the reenlistment model). It may be worth noting that the 
1 J c 

circumstances in which r (VX) < k can be stated more succinctly: r (VX) 

< k if and only if some linear combination of the vectors in X is in the 
null space of the transformation V. 

If r (VX) = k, then X'VX is nonsingular, and the ordinary least 
squares estimator, under the transformation V, for B in Y = XB + Zfi + f 
is B = ((VX)'(VX))“^ (VX)'(VY) = (X'VX)~^X'VY. 

A definition of terms should now be made. B, in the equation above, 
has been called an estimator for B under the transformation V. But it 
is clear that if B is linear in VY, then it is also linear in Y. That 
is, for any linear transformation A, A(VY) = CY for some linear transforma- 
tion C. The reason for this apparently unnecessary terminology is that 
this estimator B is the best linear unbiased estimator for B (it will be 
shown later) among all those unbiased estimators for B that are linear in VY. 
[The definition of "Best" used throughout this paper is that employed in 
the Gauss-Markov theorem. An estimator b for B in the equation Y = Xb + 

In +eis best linear unbiased if it is linear in Y, if it is unbiased 

and if any other estimator of B which is also linear in Y and unbiased 

has a covariance matrix which exceeds that of B by a positive semidefinite 

matrix.] That B can be the best unbiased estimator linear in VY and 
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yet not be the best unbiased estimator linear in Y is clear, since the 
transformation V is not invertible. That is, no linear transformation on 
VW can reproduce W. If this were possible, then there would exist some 
matrix A such that AVW = W for all W. But since V is singular, there 
must exist a vector W-j (not identically zero) such that VW-j = 0. 
Specifically, W-j = N can be the n-vector with unit elements. So AVW-j = 

A 0^ = F W-j . [Equivalently, V is not isomorphic. It has null space 
S = {W:VW = 0}. Consequently, V maps all vectors of the form Z + cN, 
where c is a scalar and N the n-vector of unit elements, into the vector 
VZ.] In addition to being the best linear unbiased estimator for B 
under the transformation V, B is in many cases the best linear unbiased 
estimator for 3 as well. This is the subject of the next part of this 
section. 

D. POOLED TIME SERIES AND CROSS-SECTION DATA: EFFECT OF THE COMPOSITION 

OF THE DISTURBANCE TERM ON THE MODEL 

The ordinary least squares estimator for 3, under V, shows a degree 
of insensitivity in its quality of "best linear unbiasedness under V" to 
the composition of the disturbance term of the model. The type of 
composition of the disturbance term for which the property of best 
linear unbiasedness, under V, of B is invariant is considered here. 

It may happen that in a regression model involving time series and 
cross-section data the disturbance term for an observation is composed 
of effects due to the cross-section, an effect due to the time series, 
and a series of remainder terms (that is, components of the disturbance 
term which are due to the joint effects of cross-section and time 

4 

series). For example, the disturbance term for economic entity i, 

4 

As postulated by, for example, Kuh [11] and Chetty [12]. 
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I 



w 








subject to factor j at time t may be given by: 



1 . 

2 . 



3. 



4. 



5. 



6 . 



= ijt ' ”101 * “i * “it * "jf 

E(n^j^) ~ 0} i ~ 1} .«.I) j ~ 1) ...0} t ~ 1} ...T 
2 

Var (n^j^) = o for all i, j, t 

n-.^'s are independent, Normally distributed random variables 

1 J t 

No statements can be made concerning the distributions of the 
random variables a^- , yj> . 

No statements can be made concerning the independence, or correla- 
tions, of the random variables n,-^u.. a., 5+ + 

IJt I J tj IJ 

(other than as in 4. above) 



7. Each random variable is invariant over any dimension not included 
as a subscript in its notational expression. 

The disturbance structure hypothesized here is central to later work. 
For ease of reference, call the error structure formally assumed by 
statements 1. through 7. above "disturbance structure (A)." 

Under the specifications of disturbance structure (A), no conclusion 
can be made about the form of E (e) or Var (e). Consequently no claims 
can be made regarding the unbiasedness of the ordinary least squares 
estimator for g in the original data. And the generalized least square 
estimator is unknov/n, since Var (e) is unknown. But for c = Ce.jj-^] and 
n = I^^ijt^ specified above, Ve= Vn , since Va = Vy = V6 = Vx = Voj 



= Vtt =0. Hence under disturbance structure (A) the ordinary least 
squares estimator, under V, for 6 is unbiased: 
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B = (X'VX)"^ X'VY 

E(B) = E[(X'VX)"^X'VY] = E[(X'VX)“^ X'(V X e + Ve)] = 

6 + (X'VX)"^ X'V E(n) = B + 0 = B . 

And the variance of B is given by: 

Var (B) = E[(B-e)(B-B)‘] = 

E[(X'VX)‘^ X'Vec' VX(X'VX)"^] = 

E[(X'VX)"^ X'Vnn' VX(X'VX)"^] = 

(X'VX)'^ X'VE(nn') VX(X'VX)"^ = 

(X'VX)"^ X'VIVX(X'VX)"^ = 

0 ^ (X'VX)'^ X'VX(X'VX)"^ = 0^ (X'VX)‘\ 

2 

since E(nn') = a I» and since V is idempotent. 

It is now possible to show that, under disturbance structure (A), 

B is the best linear unbiased estimator, under V, for B- But it is first 

worthwhile to show that any linear transformation which has null space 

identical to that of V (that is, any linear transformation which maps 

precisely the same vectors onto the null vector) is itself a linear 

transformation, under a nonsingular matrix, of V. That is, that the 

matrix V which removes the stochastic variables a., y-» <5* X.., to., and 

1 j t, ij it 

TTjt ft^om the disturbance term, and under which the image of a vector 
[nijt] which varies over all dimensions is non-null, is unique up to a 
nonsingular linear transformation C. Suppose there exists another linear 
transformation, say A, such that Ae = An (Aa = Ay = A6 = Ax = Ato = At: = 0) , 
for all n-vectors c. Then since A and V are to have the same null space, 

AX = 0 if and only if VX = 0. In particular, this must hold for the vec- 
tor VX: AVX = O’, if and only if VVX = VX = O’. An equivalent statement is 
that the system A(VX) = 0 has only the trivial solution VX = Hence 
either A is nonsingular or A = CV for nonsingular C (in the latter case 
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AVX = CVVX = CVX and AX = CVX). But if A is nonsingular, then AX = O’ 
implies that X = 0^. So, for nonsingular A, A and V could not have the 
same null space. Hence A = CV, for nonsingular C. 

Nov^ since CV, for nonsingular C, is the only linear transformation 
which removes stochastic variables a., 3., Yt tt-x from the 

model, any other unbiased estimator of 3 must be linear in CVY, hence 
in VY. Consider any other such estimator, say AVY, where A is a k x n 
matrix independent of Y. 

Let D = A - (X'VX)"^X'V. 

Then AVY = [D + (X'VX)'^X'] VY = 

[D + (X'VX)'^X'] [V X 3 + Ve] = 

[DVX + I] 3 + [D + (X'VX)'^X'] Ve. 

But E(AVY) = (DVX + I) 3 + [D + (X'VX)"^X'] E(Ve) = 

(DVX + I) 3 + [D + (X'VX)"^X'V] E(n) = 

(DVX + I) 3. 

So in order for AVY to be unbiased, it is necessary that DVX = 0. So the 
estimator becomes 3 + [D + (X'VX)~^X'] Ve. The corresponding sampling 
error is [D + (X'VX)~^X'] V e, and the covariance matrix is: 

E[{DV + (X'VX)'^X'V IVee'VlVD' + VX(X'VX)'^ }] = 

[DV + (X'VX)‘^X'V] E(nn') [VD‘ + VX(X'VX)"^] = 

[DV + (X'VX)‘^X'V] [VD' + VX(X'VX)"^] = 

[DVD' + DVX(X'VX)"^ + (X'VX)'^X'VD' + (X'VX)'^X' VX(X'VX)'^ ] = 
[DVD' + (X'VX)'^]. 

So the covariance matrix of the estimator AVY exceeds the covariance 
matrix of B = (X'VX)~^X'VY by DVD', a positive semidefinite matrix. Hence 
B is the best linear unbiased estimator under V in the sense that its 
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covariance matrix is exceeded, by a positive semidefinite matrix, by the 
covariance matrix of any other linear unbiased estimator of $ under V. 

And, since B is the best linear unbiased estimator for B under V, 
and since only those estimators linear in VY can claim to be unbiased, 
the estimator B is the best linear unbiased estimator for B under 
disturbance structure (A). 

The discussion of the hypothesized error structure has been couched 
in terms of pooled cross-section and time series data. But in any 
regression model involving cross-sectional data (no matter what the nature 
of the cross-sectional dimensions) it is clear that, if no more specific 
statement about the error structure can be made than that disturbance 
structure (A) applies, then B = (X'VX)'^X'VY is the best linear unbiased 
estimator for B. 

E. AN UNBIASED ESTIMATOR FOR 

Assume disturbance structure (A) from the preceeding section applies. 
The purpose of this section is to show that: 

S^ = e'e/[(I-l)(J-l)(T-l)-k] 

2 

IS an unbiased estimator for a in 

Var (B) = (X'VX)“^ . 

Consider the estimator B = (X'VX)”^X'VY of B in the model; 

Y = XB + Zfi + £, VY = V X B + Ve. 

The residual vector is e = VY - VXB = VY - VX(X' VX)"^X'VY = 
[V-VX(X'VX)"^X'V] Y. Let M = V-VX(X'VX)”^X'V. Then e = MY and M is an 
idempotent matrix with trace (I-l ) (J-1 ) (T-1 )-k. To see the idempotency 
of M: 
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MM = [V-VX(X'VX)“^X'V] [V-VX(X'VX)"^X'V] = 

V-VX(X'VX)‘^X'V - VX(X'VX)"^X'V + VX(X'VX)"^X'VX(X'VX)"^X'V = 
V-VX(X'VX)'^X'V - VX(X'VX)'^X'V + VX(X'VX)“^X'V = 

V-VX(X'VX)’^X'V = M. 

To see tr (M) = ( I-l ) (J-1 )(T-1 ) - k: 

Since the trace of the difference of two matrices is equal to the 
difference of the traces, 

tr(M) = tr(V) - tr(VX(X'VX)‘^X'V) = 

(I-1)(0-1)(T-1) - tr(VX(X'VX)“^X'V) . 

And since for two matrices A, B, of compatible order, tr (AB) = tr(BA), 

tr(M) = (I=1)(J-1)(T-1) - tr((X’VX)"^X’VX) = 

(I-1)(J-1)(T-1) - trd,^) = (I-1)(J-1)(T-1) - k, 

where is the identity matrix of order k. 

The residual vector may also be written, e = MY = MVY = MV (Xe + e) 

= MVe, since MVX = VX - VX(X'VX)”^X'VX = VX - VX = 0. 

So the error sum of squares is e'e = e'VM'MVe = e'VMVe = n'VMVn = 
n'M n» since Ve = Vn. And, since n'Mn is scalar, it is equal to its own 
trace: e'e = tr(ri'Mn). And since tr(AB) = tr(BA), e'e = tr(n'Mn) = 
tr(Mnn'). And since the trace of a square matrix is a linear operation 
on the matrix, the expected value of the trace is equal to the trace of 
the expected value: 

E(e'e) = E[tr(Mnn')] = tr[E(Mnn')] = tr[ME(nn')] = tr[a^MI] = 
tr[a^M] = tr(M) , 

since for a scalar k and matrix A, tr(kA) = k tr(A) . 
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So E(e'e) = [(i-i ) (j-l ) (T-1 ) - k ]. 

So, for S^ = e'e/[(I-l)(J-l)(T-l) - k], E(S^) = . 

F. THE JOINT DISTRIBUTION OF B AND S^ 

A theorem with application in statistical analysis may be expressed 
as follows: If A is an idempotent matrix and ^ is an n-variate Nromal 

O 

random variable from a N(0,o ) distribution, then the quadratic form 
1 2 

-^p'Au is distributed x with q degrees of freedom , where q = tr(A) = 

5 

rank of A. This theorem can be applied to the results of the preceed- 
ing section which showed that e'e = n'Mn , where M is idempotent and the 

elements of n are independent identically distributed Normal random 

2 2 
variables, each with mean zero and variance a . By the theorem, e'e/a 

is distributed with (I-l ) (J-l ) (T-1 ) - k degrees of freedom. 

Now consider the estimator B for 3 . It has already been shown that 

E (B) = 3 and 

Var (B) = a^(X’VX)'\ 

And B = (X'VX)'^X'VY = (X'VX)"^X'V(X3 + e) = 

(X'VX)"^X'VX3 + (X'VX)’^X'V£ = 

3 + (X'VX)"^X'V n. 

So, since B is linear in the components of n> B has a multivariate normal 
distribution also 

B ~ N(3, a^(X'VX)"^ ). 

It can now be shown that the Chi-square and Normal distributions described 

2 2 

above are independent. Note that e'e/a = n'Mn/ a is an idempotent 
5 

For a proff of this theorem, as well as of the converse implication, 
see Hogg, R. , and Craig, A., Introduction to Mathematical Statistics, 
pp. 348-351, MacMillan, 1965. 
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X 



m 




quadratic form in n» and that B = 3 + (X'VX)”^X'Vn is a vector whose 

elements are linear in n> where the components of n are independent 

identically distributed random variables. A sufficient condition for 
2 

e'e/a and B to be statistically independent is that the product of 
(X'VX)”^X'V and M be equal to the null vector.^ That this is so is easily 
veri f i ed : 

[(X'VX)'^X'V] M = 

[(X'VX)‘^X'V] [V-VX(X'VX)"^X'V] = 

(X'VX)'^X'V - (X'VX)"^X'VX(X'VX)‘^X'V = 

(X'VX)'^X'V - (X'VX)‘^X'V = 0 . 

2 

Hence e'e/o and B are independent. 

Now since: 



" in--TiTj-'iTrT-Ti'--TT = [(i-i)(j-i)(T-i) - kj 
2 2 

is linear in e'e/a , S and B are independent as well. 

As a consequence, it is now possible to get a joint distribution of 

2 

S and a linear combination of the components of B. Now B - 6 ~N (0, 
a^(X'VX)"^). Let W be a k-vector of constants. 

Then W (B-g) ~ N(0, W (X’VX)“^Wo^). 



And 



H'(B- 3 ) 



[ 0 ^ W'(X'VX)"^W] 



1/2 



N(0,1). 



^For a proof of this assertion, see Theil , H., Principles of 
Econometrics , pp. 83-84, Wiley, 1971. 
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2 

So that, since B and S are independent, 

■ , ,. [(1-1)(J-1)(T-1) - 

[a- W(X'VX)-'W]« 

9 P 1/2 , 1/2 

{[(I-1)(J-1){T-1) - k] SVa } S[W'(X'VX)"‘W] 



has 



t-distribution with (I-l ) (J-1 )(T-l).-k degrees of freedom. 

So a confidence interval for W'6, is a linear combination of the 
elements of S, is given by 

1 

W'B ± t, a S {W'(X'VX)"^W} ^ 

2 

.L I_ 

where t, a is the 100 (1-a)^^ percentile of a t-distribution with 
'"2 

(I-l ) (J-1 ) (T-1 ) - k degrees of freedom. 

In particular this holds for a vector Wp which has zeros in each 
component, except for the p ^ element which is equal to one. Applica- 

XL 

tion of this vector Wp will give a confidence interval for the p 
component of 8, p = 1, ...k. 

G. AN ALTERNATE DERIVATION OF V 

The calculations which yield the elements of the matrix V, introduced 
in Section B , may not be apparent. The purpose of the present section 
is to delinate the sequence of steps that lead to the elements of V. 

As a vehicle, consider a disturbance term of the form, once again, 

^ijt " "^ijt ^ “i ^ ^ ^t ^ ^ij “it ^ where nothing is 

known or can be reasonably assumed about the components of the 

1 J L 

except that the n^.j^'s are independent Normal random variables, each 

2 

with mean zero and variance a , 
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Now: 



( 2 ) 



IJ 



= fl 



T ^ ^ijt 



I Y 

T I ^ijt 



+ rt * Y • 

“l 'J 






St * 



IJ 



1 

T 






'it 



i 

T 



’'jt 



(3) 



i .t 



" j I it " J Z^ijt + “i + y I Yj + 6t + 



(4) 



(5) 



( 6 ) 



1 V 

J I ^ij 



^ ‘^it ^ J I 



'jt 



•jt = T^ijt = T\ ’^ijt "■ ll “i ^ Yj + 6^ 



T I ^-j + 1 1 ‘^it 



jt 



y.. jT I ^ ^ijt JT I ^ "^ijt “i J I Yj + J ^ <5^ + 



J ? ^ij T J “it JT H "".i 



J t 



jt 



IT I ^ ^ijt " IT I ^ '^ijt I I “i Yj + j I 5 t 
1 1 ^ij It I ^ “it T ^ ''jt 



^..t IJ ^ ^ ^ijt IJ ? ^ "^ijt 

-1 J 1 J 



^ T ? “i T ^ Yj + H "■ 



IJ ^ ^ ^ij ''’ I ^ “it ^ J ^ ^jt 
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( 8 ) 



^ IJT ? ? ^ ^ijt IJT I I ^ '^ijt I I “i J ? 



1 J t 



1 J t 



T ^ IJ I I ^ij ^ IT ^ ^ “it ^ JT ? ^ ""jt • 

Adding and subtracting (1 )-(2)-(3)-(4)+(5)+{6)+(7)-(8) , the disturbance 
term for the ijt^” observation in normalized data becomes: 

V 



‘ijt " ^ijt 



- =i.t ■ =.jt M.. =..t - 






"i.t - \jt '>1.. \j. + r.t ■ 



IJT 



I IJT - JT ^ - IT J - IJ I n, 



jt 



I I I 

j t 



’’J* " M "'Jt " ' H "'-it i Pt 



The equations (2) through (8) above v/ere written out in the inconvenient 
summative form to make obvious the fact that the variables a., y,-» 

03^.^ and TTj^ disappear completely from the disturbance term of the 
normalized model. This is so since the equations (1) through (8) are 
written in terms of the random variables themselves, not in terms of 
realizations of these random variables. These random variables also 
disappear, of course, in the event that one or more of them is degenerate, 
as might happen if an unobservable explanatory variable were implicitly 
included in the disturbance term 

I J o • 

The expression for consists of adding and subtracting various 

multiples of given random variables. But in this expression any random 

variable n- . . may be included under more than one summation sign. 

0 

Concentrate on one normalized disturbance term, say y. • , and rearrange 

ilJlti 

terms in the series of summations so that each random variable 

I J ^ 
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i 









appears once and only once in the expression for y. . . : 






t ' m Hj . t - JT i n,, t - IT i n, - 



IJ j; Hi • t ^ ^ ^ n. + 0 H n-. t H 

t j t i t ’^1 i j 



- m 

i J t 



'ijt 



-^j(I-l)(J-l)(T-l) n. .. t - (J-D(T-l) I n-- . 
IJT ( . ij^t^ 

i?^i-, 

(I-D(T-l) I n.. - (I-D(J-l) I n.. . t + 



5^3 






(T-1) I I n.^ + (0-1) I I n.. ^ + (I-l) I I Hi jt 

i j i t "^1 j t 

i7i-| i7i-| t?^t^ t?^t^ 

- f ' ' •» I ■ 

i7i-| jYj-| 

So that y. . . is a series of summations of independent, identically 

ilJlti 

distributed Normal random variables. 

Since each of these random variables has mean zero and variance 

2 

o , It IS clear that: 



E(w, < t ' = ° 



and 



2 

'(m) Varj(I-l)(J-l)(T-l) - 



(0-l)(T-l I 



1 

i7i 



1 "I 



1 



(I-lHT-l) I 
j^j-| 



(I- 1 )(J- 1 ) I 

t^t. 
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MT-1) J I-iij-t/fJ-i) I I I N,jt 

i?^il jYJi i^i'l t?«t^ 

■Ml -wl- 

i^i■^ j?*ji 

{mi |tu-i)(J-')(T-i]' var . [(J-mT-l)]2 J 



.[(I-l)(T.l)]2var/I n,-^ \ - [( M ) (J-1 ^ . 

\jYJl ' t?5ti 



(T-1)2 var/ I I (0-1)2 J J ^ 

\i7i-| / \i7i-i t/t-j 



'■Jl 






g'^j-i tft^ 



d7i-| j/j-| tft■^ 



(ot) + [(J-1)(T-1)]2 (I-l) + 



[(I-1)(T-1)]2 (J-1) + [(I-1){J-1)]2 (T-1) + (T-1)2(I-I)(J-1) + 



(J-1)2(I-1)(T-1) + (I-1)2(j- 1)(T-1) + (I-l ) (J-1 )(T-1 ) | 



g'^ (1-1) (J-1) (T-1) 
IJT 
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Note that this applies for all And since is a linear combina- 

tion of independent, identically distributed Normal random variables, 
y. .. is also Normally distributed. 

I J t 

Note that the diagonal elements of the covariance matrix E(yy') are 
2 

each a (I-l ) (J-1 ) (T-1 )/IJT. But also note that, since each of the y^-^^-'s 

1 J t 

is a linear combination of the same IJT random variables i=l,...I, 

j=l,...J, t=l,...T, the are not independent. 

The remainder of the covariance matrix may be found by straightforward 

but tedious calculations. Since E(y...) = 0, these calculations (using 

1 J t 

the summative expression in the n--+'s for each y. • + ) yield 

ijt ^l^l^l 



) - E(y. -X y. . ) = 

I'^n '2'^2''2 'rri ^2^2^2 






-(J-l)(T-l)g 
IJT 

-(I-l)(T-l)g 
IJT 

-(I-D(J-l)g^ 

IJT 

(T-l)g^ 



IJT 

(J-l)c 

IJT 

(I-l)c 

IJT 

2 

-g 



IJT 



if i-| f i2> J’l = J’ 2 ’ ^1 " ^2 



if i-| - i 2> j] ^ J 2 ’ ^1 ^2 



if i] = i2» j] = ^2’ ^1 ^ ^2 



if i-| t ^ 2 * "^1 ^ ^2’ ^1 ^2 



if i-| f i 2 » J-| = J‘2’ ^ ^2 



if i-| = i2> J] ^ J2’ ^1 ^ ^2 



if i] 7 ^ i2» j] ^2* ^ ^2 



j 

So that, for the matrix previously defined, a V = E(yy'). 



H. THE CASE WHEN FEWER THAN IJT OBSERVATIONS ARE USED 

Suppose the components of the disturbance term are independent 
identically distributed Normal random variables with mean zero. Then 
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for ordinary least squares estimation in the original data the quantity 
2 2 2 

(IJT - k) Sq /a has x distribution with IJT - k degrees of freedom, 

2 2 
where Sq = e'e/(IJT - k) is the estimator in the original data of a . 

When normalized data are used the quantity: 



[(I-1)(J-1)(T-1) - k] \ = 






has distribution with (I-l )(J-1 )(T-1 ) - k = ^ ^ 

2 2 

degrees of freedom, for S the estimator of a previously derived. In 
addition, the latter distribution still applies when disturbance 
structure (A) is assumed. An analagous relationship holds when n < IJT 
observations are used in the least squares estimation (such a case might 
arise when some observations must be discarded for one reason or another). 
In this case, for ordinary least squares estimation in the original data 
the quantity (n - k) Sq / a has x distribution with n - k degrees of 
freedom. It is desired to show the analagous distribution (in S ) when 
normalized data are used. But when not all observations are allowed, 
the method of "normalizing" the remaining observations is not obvious. 

The most straightforward approach is to take the appropriate means, in 
the normalization process, over those observations that are available. 

4-U 

Then, for example, the normalization of the (i,j,t)‘'" observation on 
the dependent variable (which is assumed to be used) still has the form: 



^ijt - ^ij. - y 



i .t ~ jt ^ ■^i . . ^ j . ^ .t " ^ . 



where now 

(*) y 



ij 



I 

llT(i,j)|l teT(t,j) 
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f n 

I 

4 



1 



1 



'i.t 



,jt 



1 . . 




y 



.t 



y 



I 1 1 



I 

jeJ(i ,t) 





1 ^ijt 
1 iel(j,t) 








1 


- r , 

j£j(l ,t) 


I 

t£T(i ,j) 


1 |0(i »t) 1 


M|T(i.j)|| 




1 


- I , , 


I 

teT(i ,j) 




M|T(iJ)|| 




1 


- I , 

lel(j,t) 


I /iJt 

jcJ(l .t) 


||I(j,t)| 


M |J(i.t)l 1 




1 




■ 1 1 1 
iel(j,t) jeO(i,t) teT(i,j) 




l-l |J(i.t)l 1 


•||T(1.j)|| 



9 



where, for example, T(i,j) is the set of all years in which the observa- 
tions of for Rate i and pay grade j, are used and ||T(i,j)|| is the 

number of elements in T(i,j). The normalized value of any observation 
which is not used in the least squares estimation is taken to be zero. 

The same form applies for normalization of the explanatory variables in 
X. With a little reflection it is seen that, in effect, this normaliza- 
tion process implicitly takes the value of an unused observation of any 
variable to be the sum of the appropriate means over observations which 

are in fact used. That is, an unused observation y. .. is taken to be 

•^ijt 

equal to: 



ijt 



= y 






+ y^ 



.t ^ ^.jt 



- y^ 



- y 



- y 



.t 



+ y 



where the terms on the right hand side of this equation are as given in 
(*) above. In particular, this modified normalization process is applied 
to the disturbance terms e. as well. Let v represent the n-vector 
(n < IJT is the number of observations used) of disturbance terms under 
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I 




the modified normalization. Define, as in the preceeding section, 

Vq = ^ E(yv') , 
a 

where the matrix Vq has order n < IJT. Note that the diagonal element 
of Vq which corresponds to observation (i,j,t) is equal to: 

{||I(j,t)||-l)(||J(i,t)||-l)(||T(i,j)||-l) 

I I I • 1 1 J(i ,t) I I • I |T(i ,j) I I 

since it represents the variance of a component of y derived through the 
modified normalization specified in (*) above. Thus, the trace of Vq is 
equal to: 

^ ^ ^ (l|I(j.t)||-l)(||J(i,t)||-l)(||T(i,j)||-l) 

ieUI(j,t) jeUJ(i,t) t£UT(i,j) | | I( j ,t) | | • | | J(i ,t) | | • | |T(i ,j ) | | 

Note also that Vq is symmetric and that for an arbitrary n-component 
disturbance vector e, VqVqe = VqE , so that Vq is idempotent. That 
this is so is clear since for e- . , e. ., e e- , e • , e and 
e as specified in the equations (*), Vq ^i t ~ ^ jt ~ 

V„ e. = V e . = V e . = V e =0^. The matrix V„ has properties 

ui.. o.j. o..t 0... 0 

analagous to the matrix V considered previously, and represents the 
linear transformation which projects an n-vector of observations into 
the modified normalization of that vector. 

Now let N(n) = tr(VQ) = 

^ ^ ^ (iii(j.t)ii-i)(HJ(i,t)ii-i)(iiT(i,j)ii-i) 

ieUI(j,t) jeUJ(i,t) teUT(i,j) | | I(j ,t) | | • | | J(i ,t) | | • | |T(i ,j) | | 

and let = V^ - V^ X(X.'VqX)’^X'Vq, where X is now the n x k matrix of 
observations which results from removing the IJT-n unused observations 
from the original IJT x k matrix of observations X. -Then the error sum 
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of squares for the least squares estimation in modified normalized form 
of the data (with unused observations removed) is e'e = e'MqE where 
is an idempotent matrix of rank N(n) - k. That is idempotent is 
clear since = [V^ - VqX(X ' VqX)’''x' V^JCV^ - VqX(X'VqX)‘‘'x'Vq] = 

Vq - VqX(X'VqX)"''x'Vq - VqX(X'VqX)"''x'Vq + VqX(X'VqX)“''x'VqX(X'VqX)'''x'Vq = 
Vq - VqX(X'VqX)”^X' Vq = Mq. And has trace (hence rank) N(n) - k since; 



tr(M) = 


tr[V^ - VqX(X‘VqX)"''x'Vq] 


tr(Vo) - 




tr(V„) " 


tr[X'VQX(X'VQX)“''] = 



N(n) - k. Hence for disturbance term e specified by: 



'ijt ^ijt 



+ a- + Y,- + + X- - + 0)- 



TJ 



^it 



'jt 



where are independent identically distributed Normal random 

2 1 2 

variables with mean zero and variance a , has x distribution 

a 

with N(n) - k degrees of freedom. Thus, for the estimator: 



s2. 



e e 



s-MoS 



N(n) - k N(n) - k 



of 



2 2 2 

[N(n) - k] S /a has x distribution with N(n) - k degrees of freedom. 

For those cases in which the removal of observations is not systematic 
(that is, when observations are discarded in no regular pattern), computa- 
tion of N(n) may involve many computations and may require that one keep 
track of a large number of values of |ll(j,t)]|, |lJ(i,t)|| and ||T(i,j)||. 

It may therefore, be beneficial to derive the distribution of an alternative 

2 

random variable linear in S . The quantity: 



(I-1)(J-1)(T-1) n 
IJT ■ 




N(n) - k 


s2 


'(I-l)(J-l)(T-l)n 

- k 


9 

S" 


N(n) - k 


2 

o 




IJT 


2 

0 
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is linear in 



[N(n) - k] 

2 
a 

2 

hence has x distribution, with degrees of freedom given by; 



E 



(I-1)(J-1)(T-1)n 

IJT 



k 




(I-1)(J-1)(T-1)n . 
IJT 




(I-1)(J-1)(T-1)n . 

IJT - K . 

Thus the analogy is completed. 

I. GENERALIZATION TO q CROSS-SECTIONS 

There is a natural generalization of all of the preceeding sections 
to the case in which q cross-sectional dimensions are involved. 
Previously, recall, all was described in terms of three cross-sectional 
dimensions . 

Suppose q cross-sectional dimensions are being considered in the 
model Y = X3 + + £. Analagously to the case for q = 3, let the 

variables whose observations are represented by X and Y vary over all 
q dimensions, and let each variable in Z vary over at most q - 1 dimen- 
sions. Also let the disturbance term e be constructed analagously to 
the previously considered case, q = 3. That is, for q cross-sectional 
dimensions, with respective numbers of categories I-|,...Iq, e is a 
linear combination of: q 




random vectors, one of which varies over q cross-sectional dimensions 
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<c 




|i 



(let this single random vector be denoted as n> as before, where the 
elements of n are written with q subscripts) and the remaining 



q 



of which vary over at most q - 1 dimensions (that is, the elements of 
each of these remaining random vectors are written with fewer than q 
subscripts). Also, the elements of n are independent, identically 
distributed Normal random variables, each with mean zero and variance 



random variables are subject to any unknown distributions, and to any 
unknown conditions of stochastic non-independence. 

All the properties that have been derived in preceeding sections 
flowed naturally from a knov/ledge of the idempotent matrix V. Thus, in 
order to characterize the general case for q cross-sectional dimensions, 
it is only necessary to find the appropriate matrix whose properties 
are analagous to those of the previously defined V. To this end, let 
C^-j be the subscript (in the notational expression for the elements of 
n; there are q such subscripts in the notational expression for each 
element of n) representing the i^*^ category of the j^*^ cross-sectional 
dimension, j = l...q, i = 1 ,...!.. 

J 

Then the elements of V = ^E(nn') are given, for i =!,...! , 

~ rr r r 



2 

a , and the remaining 



q 



a 



> (VD. 





ir 

meS 



where : 




and p is the number of elements in S. When S is empty, define 
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That is: S is the set of all cross-sectional dimensions for v/hich the 
subscripts C. „ and C. „ are equal in the variables „ and 

i^m i^q 

’^C. , whose covariance is an element of V . Or: S is the set 

Jl‘ JqH q 

of all cross-sectional dimensions for which the above two random vari- 
ables correspond to the same category. Note that the set S depends on 
the two elements of n whose covariance is being considered. 

To complete the analogy to the case q = 3, V is an idempotent matrix 

H 



of order 


q 




TT 




k=l 


and trace (=rank) 


q 




TT 




k=l 



J. THE INAPPROPRIATELY APPLIED MODEL: A CASE IN WHICH DISTURBANCE 

STRUCTURE (A) DOES NOT APPLY 

Before proceeding with this section, it may be instructive to amplify 
on the derivation of the transformation V. Note that the originally 
stated purpose of the transformation V was to rid the model Y = Xb + Zfi 
+ e of the effects of certain unobserved or unobservable explanatory 
variables. The disturbance structure (A) hypothesized in 0 

was constructed, more or less artifically, to take advantage of the pro- 
perties of V. Disturbance structure (A) is simply the most general case 
of the original problem: it contains all possible sources of error which 

the transformation V is able to remove. Consider a model of the form 
Y = Xb + Zfi + e as previously introduced. Then the following statements 
are equivalent: 

a. e obeys disturbance structure (A): 

b. The elements of e are independent, identically distributed Normal 

2 

random variables, each with mean zero and variance a , and 
included in the specification of the model (specifically, in Z) 
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is any variable (observed or not) which may be written as vary- 
ing over fewer than q cross-sectional dimensions (q is the total 
number of dimensions involved in the data). 

c. No knowledge or information about the disturbance term £ may 
reasonably be assumed except that at least one component of each 
e— ^ is a sample from a Normal population with mean zero and 
variance a^. 

This situation suggests two useful observations. The first concerns 
the unobserved or unobservable explanatory variables which, by the 
dictates of theory (that is, theory relating to the subject being modeled) 
or other considerations, are necessarily included in some model of the 
form considered here.' Note that, since the transformation V rids the 
model of these variables (as long as each of these variables varies over 
fewer than q cross-sectional dimensions, where q is the total number of 
dimensions involved) in any case, it is conceptually and practically 
equivalent whether these variables are explicitly included in the formal 
form of the model, or whether they are implicitly "thrown into" the 
disturbance term. This is a trite observation, but it is v/ell worth 
noting for the following reason: some studies and analyses (see, for 
example, Nerlove [8]), when implicitly including an unobserved or 
unobservable explanatory variable as a component of the disturbance term, 
make a strong and possibly erroneous^ assumption in order to complete the 
regression analysis (that is, in order to be able to claim an unbiased 

^The term "erroneous" should be seen in context. The case of interest 
here is that in which there exists some unobserved explanatory vari- 
able which is expected to have a significant effect on the dependent 
variable. In addition, it is supposed that the analyst has no (or 
does not care to get any) information about the values of this 
variable. Such a variable may indeed not even by quantifiable. 
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estimator of the regression coefficients) without using some transforma- 
tion such as V to purge the model of the offending variable. Specifically, 

g 

the required assumption is that the disturbance term (which now implicity 
includes unobserved and unobservable explanatory variables) has known 
mean, usually zero. [It is further typically assumed that the disturbance 
term is Normally distributed, although this assumption is not necessary 
if all one wishes to do is ensure that the estimator is unbiased.] That 
this assumption may be erroneous can be seen in two approaches to the 
assumption. One may simply make this assumption with no justification. 

But since theory, or other consideration, has dictated that the unobserved 
explanatory variables does have an effect on the dependent variable, the 
original problem still remains. And the resolution to that problem is 
still to remove the offending explanatory variable (whether explicitly 
included in the model or implicitly included as a component of the 

Q 

disturbance term) by some transformation such as V. Alternatively, one 
may attempt to justify the assumption by means of some device such as 
the Central Limit Theorem, in this case making the additional assumption 
that the components of the disturbance term, which now includes the un- 
observed explanatory variables, are independent. Ignoring for the moment 



g 

This assumption is characterized as "required" since unless it is 
made, some unobserved explanatory variable is, in effect, still 
being considered an explicit term in the model. 



9 

Note that V may not be unique in this respect. For example, in 
the model 



y,t = « + 6X.^ + VZ, t 



where one wishes to purge , the transformation W may be used, 
W[y,j] = [y.j - y.^]. U[X.^] - [X,^ - X,_], W[Z,.] = 



[Z. - Z.] = 0, w[e.^] = - £. ], W[cx] = [a - a] = 0 

Here [Pit] is an n-vector whose elements are Pit. 
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the fact that this latter assumption is contrary to the assumptions of 
disturbance structure (A), this sort of argument may be reasonable in 
some cases. But in justifying the application of the Central Limit 
Theorem, in order to approximate a Normal random variable of knov/n mean 
by a sum of random variables, one typically assumes that the disturbance 
term represents the net effect of numerous individually unimportant but 
collectively significant variables. But this is clearly not the case (at 
least this latest assumption cannot reasonably be made) when disturbance 
structure (A) pertains. And, more generally, it can be said that there 
are certainly studies of interest where this is not the case: the un- 

observed explanatory variable whose inclusion in the model was a necessity 
cannot in general be assumed not to dominate the disturbance term in which 
it is incorporated. In summary, there exist studies for which the use of 
a transformation such as V, to rid the model of undesired variables, is 
unavoidable if an unbiased estimator of the regression coefficients is 
to be obtained. Simply discarding an undesired variable as a component 
of a disturbance term with known mean should be viewed cautiously. As 
an example, in the reenlistment model, the inclusion of the terms 
and in the disturbance term can be expected to have a large effect on 
the disturbance term. 

The second observation concerns the best linear unbiasedness of the 
estimator B = (X'VX)"^X'VY for g in Y = Xg + + e. Recall that when 

disturbance structure (A) is assumed, B is the best linear unbiased 
estimator for g. Note that since, in disturbance structure (A), the ran- 
dom variables a, X., w and n may assume any (unknown) distribu- 

tion, and since any error terms in the model (except the n^j^-'s) may be 
interdependent, disturbance structure (A) is more general than that 
typically assumed (specifically, that error structure in v/hich the 
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elements of the disturbance term e are independent, identically distributed 
Normal random variables, each with mean zero and variance a ). But it is 
not a generalization of this latter error structure: the latter is not a 

special case of disturbance structure (A). This is so since disturbance 
structure (A) is based on a certain lack of specific information or 
knowledge about the characteristics of the components of the disturbance 
term. As a consequence, if the error structure which one wishes to assume 
is not that specified by disturbance structure (A), then B = (X'VX)"^X'VY 
is not necessarily the best linear unbiased estimator for 6 in Y = Xs + 

Zq + e . 



This latest observation leads into the proper subject of this 
section: a consideration of a common case in which B is not the best 

linear unbiased estimator for 6. For consistency of approach, suppose 
that the model is written in the form Y = X6 + e, where any unobserved 
or unobservable explanatory variables (if any), v/hich were previously 
included in Z, are now included in the disturbance term e. As has been 
seen, B = (X'VX)'^X'VY is the best linear unbiased estimator for 6 when 
e obeys disturbance structure (A). Consider the asymptotic properties 
of the matrix V in three cross-sectional dimensions. As the number of 
categories, I, J, and T, in each cross-sectional dimension goes to 
infinity, the elements of V behave as follows: 



(I-1)(J-1)(T-1) _ 

nr 



1 



-T ^ 



-J 



1 

T 



(I-1)(J-1) 

IJT 



' -T 



' -t 



0 



-(I-D(T-I) 

IJT 



1 



1 

I 




1 




-> 0 , 
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1 



1 

J 



1 



0 , 



-(J-D(T-I) _ 
IJT 



" T 1 
1 I 



I-l 



1 -f 



1 1 



IJT 


” “T~ 


J 


T 


J-1 




1 


1 


IJT 


1 


I 


T 


T-1 




1 


1 


IJT 


1 


I 


J 



0 , 



0 , 



0 , 



-1 

IJT 



-> 0 . 



[Note that when q cross-sectional dimensions are considered, the number 
of unique elements in V is 2*^, since each element of V depends on the 
comparison of the subscripts of two random variables, each of which has 
q subscripts. These two random variables may either agree or disagree 
in each subscript. For q = 3, then, V has 2=8 unique elements.] 

That is, the diagonal elements of V approach unity and all other elements 
of V approach zero. Or, as I,J, and T increase without bound, V tends 
to the identity matrix. As a consequence, (X'VX)"^X'VY approaches 
(X'X)~^X'Y as I, J and T become infinitely large. Hence, in the case 
that e obeys disturbance structure (A), the ordinary least squares 
estimator B = (X'X)"^X'Y is in the limit (in I, J and T) an unbiased 
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estimator for 3, since it is the limit of a sequence of unbiased 

estimators^® This suggests that, for sufficiently large I, J and T, 

the ordinary least squares estimator for 3> 3 = (X'X) X'Y could serve 

to approximate the best linear unbiased estimator B when disturbance 

structure (A) holds. This line of thought will not be pursued: it is 

the converse suggestion, that B can serve to approximate 3 for sufficiently 

large I, J and T, that is more interesting here. Suppose that the 

transformation V was inappropriately applied to the model Y = X 3 + e. 

Specifically, suppose that the components of e are independent, identic- 

2 

ally distributed Normal random variables with mean zero and variance a . 
Call this disturbance structure (B). Then the ordinary least squares 
estimator 3 = (X'X)”^X'Y is the best linear unbiased estimator for 3 . 

Note that B = (X'VX)"^X'VY is still an unbiased estimator for 3, but it 
is no longer best. But since V approaches the identity matrix as I , J 
and T increase, the less efficient estimator B approaches (X'X)"^X'Y 
as well. This suggests a pragmatic comparative scheme for the two 
estimators B and 3: 



®In treating a subject related to that considered here, Wallace and 
Hussain [9] have shown the asymptotic equivalence of the Aitken 
estimator and an estimator derived under a linear transformation 
(much as B was derived from the linear transformation V ) for a 
particular error structure. In the disturbance structure considered 
in their paper, the disturbance term was assumed to be a sum of 
independent random variables (in a combined time series and cross- 
section analysis), 

^it ^ “i '*'t '^if " ® 

and Maria-) = , Var (y^) = a^, Var(n^-^) = for all i, t, 

2 2 2 

where a-j , 02 , and were known. 

The paper also showed the equivalence of the iterative Aitken esti- 
mator and the estimator derived under a linear transformation for 
the disturbance structure as above with 

2 2 2 
a-j , and unknown. 
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1. Suppose disturbance structure (A) applies. Then 3 is biased, 
and B is the best linear unbiased estimator and should reasonably 
be used. 

2. Suppose on the other hand that disturbance structure (B) is 

assumed to hold. Then 3 and B are both unbiased estimators, 
although B is less efficient than 3. But note that B has an 
advantage which may offset (on a case-by-case basis) its lesser 
efficiency: it guarantees to purge all random variables vyhich 

are invariant over at least one cross-sectional dimension. That 
is, if one is unsure of the validity of the assumption that dis- 
turbance structure (B) holds, then one may see some value in 
applying the transformation V in order to rid the model of all 
such possible sources of error. 

Two concluding observations should now be made. First, it is clear 

that application of the transformation V is equally inappropriate in all 

other cases where disturbance structure (A) does not hold in the model 

Y = X3 + e. An important special case is that in which the generalized 

least squares estimator for 3 is appropriate. Just as the ordinary 

least squares estimator 3 = (X'X) X'Y is the best linear estimator for 

3 when E(e) = 0 and Var (e) = I, the Aitken estimator 3 = (X' J2~^X)~^ 

X' J2~^Y is the best linear unbiased estimator for 3 for the case in 

2 

which E(e) = 0 and Var(e) = a 

Finally, it is worth repeating the crucial condition which underlies 
the specification of the case in which the transformation V is effective. 
In the model Y = X3 + ZJ2 + e (or in the equivalent, under the trans- 
formation V, model Y = X3 + e, where the variables in Z are thrown into 
the disturbance term e) V is effective in removing unobserved or unobserv- 
able variables (stochastic or deterministic) only if these variables 
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are invariant over at least one cross-sectional dimension. Accordingly, 
all v/ork in this paper is performed under the assumption that each 
variable in X (those variables which vary over all cross-sectional 
dimensions) has been observed. 



K. INTERPRETATION OF TERMS UNDER THE TRANSFORMATION V 

Consider the model in the form Y = Xg + e, in three cross-sectional 

J. L 

dimensions. The equation representing the data in the i ” category of 
the first cy'oss-sectional dimension, the j ^ category of the second 
dimension and the t^^ category of the third dimension is y^.j^ = g + 
^ijt’ ^ijt ® k-vector of observations on the k explanatory 

variables in X. The categories of the cross-sectional dimensions corres- 
ponding to the observations y^^^ and x.^^ may be considered to be 
"treatments" which affect the values of the observations of 
X. . . in the (i, j, t) "cell". With this in mind, assume that each 

1 J ^ 

y. .. and x... can be represented as a sum of common mean, effects due to 

1 J U 1 J ^ 

single treatments (here i, j, t represent the "treatments"), two-way 
interaction effects of pairs of treatments, and a three way interaction 
effect of the three treatments. [Note that since there is only one 
observation (on each of y^^^ and x^^^) per "cell", it is generally not 
possible to discern between the effect of the three-way interaction term 
and the error term In this case, however, it is known that a three- 

way interaction term does in fact exist. That this is so can be seen as 
follows: since x. .. is deterministic, one can calculate the exact three- 

1 J t 

way interaction effect for cell (i, j, t) as x^.j^ - x - x^ ^ - x^j + 

X. + X . + X . - X , subject only to roundoff error (this express- 

•U* ••L ••• 

ion is the same as that of a sample estimate of the three-way interaction 



effect for the case of stochastic x^j^). This is not identically zero 
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I 



(for all cells), by previous hypothesis about the variables in X, so a 
three-way interaction effect is present. And since y^.^^ is a linear 
function of y^^^ also includes a three-way interaction effect.] 

That is, that; 

^ijt ‘ * °ijt *^y*^}*’^t* “ij ^it ''jt ^ijt 



_ 0 , » , /lO , nO , I _l c® 

'‘ijt >■ * ^ “ij ■" ^it 



where and are the three-way interaction terms mentioned above. 
Substituting these into the model: 



^ ^ + Ct + D.j + E,j + Fjj + 



(„» . . - A? . b5 . C? . D» . ^ F» ) 6 . 



= ijt ' Xijt B * "iof 



These effects can be equated term by term to give: 



and: 



y 


= 


y°S 


''i 


= 


A°6 




= 






= 


C?6 


“u 


= 




Eft 


= 




fjt 


= 


fjt® 


®ijt 




^ijt 
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Now consider the data under the transformation V: VY = V X g + Ve. In 

the (i, j, t)^^ cell this gives: 






<\jt - Nj. - '‘i.t - \jt ^ N-.. '‘.j. \.t - 8 



th 



+ (Ve)^.j.^ , where (Ve)^.^^ is the (i, j, t) element of Ve 



Note that the left hand side of this equation is the sample estimate 
of the three-way interaction term 0^j^- And the term in parentheses on 
the right hand side is the three-way interaction term This is the 

relationship specified in (*) above, with a sample estimate for 
replacing and with a disturbance term (Ve)^^^ included. That is, 
under the assumption that y^^^ and can each be represented as a 
sun of common mean, effects due to single treatments, two-way interaction 
effects of pairs of treatments, and a three-way interaction effect, it is 
true that 6. Hence 6 can be estimated by regressing the 

sample estimate of the three-way interaction term 0^.^^ on the three-way 
interaction term This is precisely what the estimator B = (X'VX)'^ 

X'VY accomplishes. 
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